DURATION ANALYSIS OF FLEET DYNAMICS
|
|
|
- Phillip O’Brien’
- 10 years ago
- Views:
Transcription
1 DURATION ANALYSIS OF FLEET DYNAMICS Garth Holloway, University of Reading, David Tomberlin, NOAA Fisheries, ABSTRACT Though long a standard technique in engineering and medical research, duration or survival analysis has become common in economics only in recent decades, and in fisheries economics we are aware of only one previous study. In this paper, we demonstrate the usefulness of duration analysis to understanding fleet dynamics, specifically to predicting fishers exit decisions. The analysis explores the dependency of these decisions on the length of time during which a fisher has been active and on boat characteristics. We emphasize the development of a Bayesian approach to duration model estimation, and, through an application to the US Pacific salmon fishery, we show the policy relevance of duration analysis to limited-entry fisheries in particular. INTRODUCTION Keywords: Duration, Bayesian Estimation, Exit, Commercial Salmon Fishery Fishers participation decisions are key to fisheries management, since they reflect conditions in the fishery and also influence both fish stocks and the availability of rents in the fishery. One approach to understanding these participation decisions is duration analysis, also known as survival analysis (in biomedical research) and failure time analysis (in engineering). Duration analysis is a statistical approach to understanding the length of time during which a unit (e.g., a boat) stays in a particular state (e.g., active in a fishery), as well as estimating the probability that a unit will leave that state in a subsequent period. In the context of fisheries, duration analysis is potentially quite useful because it sheds light on the factors that influence participation in the fishery and provides a means for estimating the probability that a given boat will persist in fishing or will exit, information that can help in predicting future effort. Here, we apply duration analysis to the California commercial salmon fleet to demonstrate the utility of the approach in fisheries analysis. Specifically, we suppose that the length of time during which a boat is active in the California salmon fishery depends on four factors: the boat s average annual revenue from salmon, its share of in-season revenue derived from salmon, its length, and the average number of ports per year at which it lands salmon. We develop and estimate models for two distributions of boats durations in the fishery, exponential and Weibull. The exponential model is the simplest commonly used distribution, characterized by a constant hazard rate (the probability of exit at time t+1, given participation up to time t). The Weibull model is a generalization in which the hazard rate may be increasing or decreasing with time, as might be the case for example if years of participation in a fishery affects the probability of exit independently of other explanatory variables that are included in the model. We adopt a Bayesian estimation framework, using a Metropolis-Hastings step within a Gibbs sampler to generate parameter estimates based on simulation of the posterior distribution. Because this approach to duration estimation is less common than are frequentist approaches, the paper includes a detailed description of the estimation procedure. Duration models, as a framework for investigating the factors influencing exit (or entry) in a fishery, are potentially of significant benefit in fisheries management. Understanding the influence of factors such as boat characteristics on participation decisions is helpful in its own right, and may cast light on such issues as changes in fleet composition over time (see [1]) or future fishing effort. In the context of our study fishery, our interest is motivated by the apparent hardening of the fleet into a stable core group after a 1
2 long period of large-scale exit. More generally, the approach may prove useful in addressing common fisheries management questions such as the effect of area closures on participation and the likely response of latent effort to fish stock recovery. The next section provides a brief description of the California commercial salmon fishery and our data set. We then detail the econometric models used, present preliminary results, and conclude with observations on the method and possible applications in fisheries management. STUDY FISHERY AND DATA The California commercial salmon fishery is a limited-entry fishery dominated by small trollers with a crew of one or two persons, most commonly owner-operators. The fishery takes place during the summer months (generally May-September) and is subject to a variety of regulations including time and area closures and minimum sizes. Prior to the 1970s, the fishery supported a large number of fishermen who operated under essentially open-access conditions, but concern over declining fish stocks led to the imposition of a limited-entry regime in the early 1980s. Since that time, participation in the fishery has required a salmon vessel permit, which must be renewed each year. The listing of some California salmon stocks under federal and state endangered species laws has further restricted the activities of commercial salmon fishermen. While catch and revenues have fluctuated significantly over the last two decades, the overall story is one of declining participation and revenues until the latter 1990s, when the fishery entered a period of relative stability (Figure 1). Boats Revenue (yr 2000 $US * 10^6) Active Boats Revenue Figure 1. Number of boats landing salmon in California and fleet real salmon revenue. While a total of 6541 boats report landing salmon in California at least once during our study period, in this paper we will include in our data set only the 3779 boats with annual revenue time series that are uninterrupted by periods of non-participation. Accounting for boats that switch in and out of the fishery is a significant econometric complication and we can develop our argument here more easily by excluding them from our study. It s worth noting that the 3779 boats in our data set account for only 34% of total salmon revenues in the state during , suggesting that the behavior of boats that switch in and out of the fishery opportunistically is an important topic for later work. 2
3 Table 1 gives summary statistics on the variables used in our analysis. The dependent variable is duration, or the number of consecutive years in which a boat reports landing salmon in California. The covariates are boat length, average ports per year in which a boat landed salmon, real average salmon revenue per year, and the proportion of each boat s salmon revenue over the years in which is landed salmon to each boat s total revenue during salmon seasons over those years. Boat length appears as a proxy for a boat s catching power and its ability to pursue other fisheries (we have far more data on length than on tonnage). Average ports per year in which a boat lands salmon is included as a measure of mobility, which presumably makes survival in the fishery more likely. Average salmon revenue per year, to the extent that it reflect a more profitable operation, might be expected to have a positive effect on duration. The share of a boat s total revenue due to salmon might reflect a higher degree of dependence on the species, hence a reluctance to exit the fishery. We have not included time-varying covariates, such as ocean conditions or regulations, leaving that for later work. Table 1. Summary statistics on boats used in the analysis. Mean Std. Dev. Minimum Maximum Years of Salmon Landings Boat Length (ft) Avg. Ports per Year with Salmon Landings Avg. Salmon Rev. per Year (yr 2000 USD) ,595 Share of Salmon in Total Boat Revenue Figure 2 shows the distribution of participation duration among our 3779 boats. While by far the most common duration is a single year, every participation period possible during our study horizon is represented, with 75 boats (about 2% of our study fleet) remaining in the fishery during all of Figure 3 shows the distribution of starting and ending years for our study fleet. Most entry and exit in the fishery took place during the 1980s, after which there has been far less turnover, no doubt at least in part to the earlier drop in participation. Figures 2 and 3 together demonstrate censoring in the data, which is prevalent in duration studies and will strongly influence the econometric development of the next section. In this case we have both left-censoring of the data, since our data set only contains information on landings from 1981 on, and right-censoring of the data, since the total length of time during which stillactive boats fish is unobservable. ECONOMETRIC APPROACH To answer questions about the duration of fisher enterprises, we first consider the probability of observing an enterprise of length, t. We assume that t, if observed, is a realization from a random process ƒ(t θ), where ƒ( ) denotes a probability density function (pdf) for the random variable T and θ (θ 1, θ 2,.., θ M ) denotes an M-vector of parameters conditioning ƒ( ). It follows that the probability of observing the realization t is F(t) = ƒ(s)ds = Prob(T t). (Eq. 1) As noted by Greene [2 p. 986], we will usually be more interested in the probability that the spell is of length at least as great as t, which is given by the survival function S(t) = 1 F(t) = Prob(t > T). (Eq. 2) 3
4 IIFET 2006 Portsmouth Proceedings Boats Years Active Figure 2. Duration of participation in California commercial salmon fishery by boats with uninterrupted participation histories (n=3779) Boats First Year Last Year Figure 3. First and last years of reported salmon landings by boats with uninterrupted participation histories (n = 3779). 4
5 Correspondingly, we are able to compute the instantaneous probability of failure, given that a vessel has been in operation at least as long as t. This probability is referred to as the hazard rate, giving rise to the hazard function H(t) = lim 0 Prob(t T t+ T t) = ƒ(t θ) S(t). (Eq. 3) The hazard function is important as an aid for interpreting the results of our econometric investigations because the covariate coefficients that we wish to characterize report the response of the hazard rate to changes in the various covariates. In the case of the exponential pdf ƒ Exp (T θ) λ exp(-λt), (Eq. 4) the survival function assumes the simple form S Exp (T θ) exp(-λt), (Eq. 5) giving rise to a hazard function H Exp (T) λ, (Eq. 6) which the reader will note is time-independent. This feature is the limiting property of the exponential model in duration analysis. As a tool for investigating survival of fishing enterprises, it predicts that the instantaneous rate of failure is independent of the duration of the vessel. As an alternative, the Weibull pdf ƒ Wei (T θ) α T α-1 λ exp(-λt α ), (Eq. 7) gives rise to a survival function with form S Wei (T θ) exp(-λt α ), (Eq. 8) which, in turn, gives rise to the hazard function H Wei (T) α T α-1. (Eq. 9) In the Weibull context, the parameter α is seen to play a crucial role in the analysis. A value of α exceeding one implies that the hazard function is increasing over time and a value of α less than one implies that the hazard function is decreasing over time. Importantly, when α equals one the Weibull model reduces to the Exponential model, which is nested as a special case among the family of Weibull distributions. Consequently, much interest in empirical applications of the Weibull pdf centers around the scale and location of the posterior distribution for α. To build a model in which we include covariates which, for want of a better term we call a regression model employing either the Exponential or Weibull pdfs, we first allow the parameter λ to vary across vessels and focus attention on the vector λ (λ 1, λ 2,.., λ N ). Second, we parameterize each of the vesselspecific elements of λ in the vector λ (exp(x 1 β), exp(x 2 β),.., exp(x N β)), where x 1 (x 11, x 12,, x 1K ), x 2 (x 21, x 22,, x 2K ),.., and x N (x N1, x N2,, x NK ) are K-vectors of vessel-specific covariates and β (β 1, β 2,.., β K ) denotes a K-vector of unobserved regression coefficients that is assumed to be the same across the N vessels in the industry. 5
6 To implement these models, we form a prior pdf over the parameters π(θ), where θ (β 1, β 2,.., β K ) in the case of the Exponential specification and θ (β 1, β 2,.., β K, α) in the case of the Weibull specification. We form the likelihood ƒ(y θ) for the observed duration data y (y 1, y 2,.., y N ) and study the posterior distribution function for the parameters π(θ y) ƒ(y θ) π(θ), (Eq. 10) where denotes is proportional to. An important feature of duration studies is that some of the duration observations y (y 1, y 2,.., y N ) will be censored. Specifically, if t (t 1, t 2,.., t N ) denote the survival times of the vessels in question and τ denotes the endpoint of the study, then we observe y (min(t 1,τ), min(t 2,τ),.., min(t N,τ)). To formalize the likelihood function in the presence of such censoring, a useful construct is the vector of binary indicators ν (ν 1, ν 2,.., ν N ), where, for i = 1, 2,.., N, ν i = 1 if t i τ, otherwise ν i = 0. Accordingly, the likelihood corresponding to a set of observed durations is ƒ(y θ) i ƒ(y i θ) νi S(y i θ) (1-νi). (Eq.11) No conjugate priors exist for the either the Exponential or the Weibull regression models. The prior specifications advocated by Ibrahim et al. [3] are the multivariate-normal density ƒ MN (β E βo,c βo ) exp{-.5(β-e βo ) C βo -1 (β-e βo )} for the covariate coefficients and a Gamma density ƒ G (α γ o,κ o ) α (γo-1) exp(- κ o α) for the scale parameter. Accordingly, by defining d = i ν i, the posterior pdf corresponding to the Exponential specification has the form π(θ y) exp{ i (ν i x i β)-y i exp(x i β)-.5(β-e βo ) C βo -1 (β-e βo )} (Eq.12) and the posterior pdf corresponding to the Weibull specification has the form π(θ y) α (γo+d-1) exp{ i {(ν i x i β)+ν i (α-1)log(y i )-y i α exp(x i β)}-κ o α-.5(β-e βo ) C βo -1 (β-e βo )} (Eq.13) An important aspect of both specifications is that the fully conditional distributions characterizing the joint posteriors do not have known integrating constants. Consequently, in order to estimate the components of θ over which these posteriors are defined requires that we implement a Markov Chain Monte Carlo (MCMC) method. Ibrahim et al. [3] suggest exploiting the log-concavity of both of the posterior pdfs and employ the adaptive rejection sampling algorithm of Gilks and Wild [4]. Alternatively, we find that simple Metropolis-Hastings algorithms, employing a multivariate-normal proposal density for β and a Gamma proposal density for α, work well. In each case these algorithms were run for a total of 100,000 iterations and convergence was assessed by examining the behaviour of a Chi-square statistic formed from the differences in the posterior means of the parameters obtained from the second and fourth quartiles of the sample space. In the case of the Exponential pdf convergence is quite rapid. After about 2,000 iterations posterior mean estimates are indistinguishable to two decimal places. Here we employ a target acceptance rate of the Metropolis steps of 50%. In the case of the Weibull distribution convergence was slightly problematic. Convergence was achieved after about 22,000 iterations. In this case it was found that a target acceptance rate of 75% works well. In addition, the algorithm appeared to work more efficiently when a large start value (in excess of ten) was used for the scale parameter α. In both cases inferences are derived from the second 50,000-observation samples in the 100,000-iteration chains and are obtained by using diffuse priors setting κ o = 0 and by setting the K-dimensional square matrix C βo -1 equal to the K-dimensional square null matrix. 6
7 RESULTS Results of the two estimations are reported in table 2, with 95% highest posterior density intervals reported in parentheses. In assessing the signs of the effects it is important to re-emphasize that the coefficients report the impacts of changes in the respective covariates on the hazard function, or a monotonic transformation of it. We consider the effects derived from the Exponential specification before turning to those derived from the Weibull model. Generally speaking, the larger the boat, the larger the average revenue over the period of employment and the larger the salmon share in total revenues of each boat, the lower the instantaneous probability of failure. In contrast the larger the average number of ports visited, the greater the instantaneous rate of departure. These findings are generally what we would expect in a survival study of the establishments comprising the fishery. Larger boats may be more capable of fishing in widely dispersed fisheries, making them more profitable, at the margin. Larger boats also require more capital investment, making owners less inclined to exit in the face of unforeseen shortfalls. Similarly, in the absence of cost information, average revenues are a proxy for the average profitability of the vessels in question. To the extent this proxy is a good one, one expects to observe higher revenues reducing the hazard. Average salmon share of total revenue represents a level of commitment to the fishery in question relative to other potentially gainful alternatives, thus we expect the sign of the coefficient of the salmon-share variable to also be positive. In contrast, we find that higher number of ports visited per year increases the probability of departure, which may reflect high search costs and possibly lack of success in chosen locations. We note that this result is in contrast to Smith s [1] study of the California urchin fishery, which concludes that a higher number of ports visited decreased the instantaneous probability of exit, presumably reflecting greater knowledge or ambition on the part of more mobile divers. Table 2. Empirical results. Specification Effect Exponential Weibull Boat Length (ft) (-2.01) (0.37) (-2.67) (0.31) Avg. Ports per Year with Salmon Landings (-0.08) 0.42 (0.97) (0.00) 0.56 (1.39) Avg. Salmon Rev. per Year (yr 2000 USD) (-10.06) (-7.90) (-17.32) (-6.09) Share of Salmon in Total Boat Revenue (-0.95) (-.76) (-1.63) (-0.57) Constant (-0.64) (-0.34) (-1.11) (-0.25) Scale Parameter α (0.72) 1.29 (1.75) Turning to the Weibull results, the reader will note that these results mimic quite closely those obtained from the simpler Exponential specification. Significantly, the Weibull results exhibit the same patterns of significance and signs. The scale parameter, α, which distinguishes the Weibull from the Exponential hazard function, has a posterior mean of 1.29 and a considerable proportion of its mass appears to lie above the value A straight-forward calculation from the output of the Weibull estimation reveals that, in fact, 94% of the posterior mass lies above this critical value. This leads to two conclusions. First, it suggests that the hazard is upward sloping in time. That is, the instantaneous rate of failure is positively related to duration; the longer a vessel has been involved in the fishery the more likely it is to exit. Second, short of implementing a formal model comparison along the lines of Chib and Jeliazkov [5], we can safely conclude that the Weibull model, as opposed to the simpler Exponential model, appears to be preferred by the data. In this context an interesting question arising is the magnitude of any difference in the predictions derived from the two models. The natural context within which to investigate this issue is 7
8 the survival function predictions of the respective formulations. We report these predictions graphically in figure 4. Interestingly, the two survival functions derived from the independent algorithms are almost identical. Hence, at least with respect to the data employed in this analysis, we can safely conclude that the use of the Exponential model to predict survival rates among fishers in the California salmon fishery does not appear to distort model results Survival Functions Exponential Weibull Figure 4. Survival functions inferred from Exponential and Weibull model estimations. SUMMARY AND CONCLUSIONS In this paper we introduce two of the fundamental specifications of survival analysis and apply them to data on the California salmon fishery using a Bayesian MCMC simulator. In so doing we extend the excellent work of Smith [1]. The empirical results are intuitive and do not appear to be greatly affected by choice of specification. Whether this conclusion remains robust to other choices of specification, remains to be seen. Because fishers participation decisions reflect the financial condition of the fleet and strongly influence the efficacy of measures designed to protect the fish stock, a better understanding of these decisions can contribute directly to management. In the California commercial salmon fishery, the number of boats active in the fishery has stabilized after a long period of steady decline. While we have not yet explored formal predictions of fleet dynamics based on our econometric results, the models presented here appear to have significant explanatory power, providing strong evidence for what intuition would suggest: boats that generate more revenue, have greater capability to pursue other fisheries outside of the salmon season, and that concentrate more on salmon during the salmon season, are less likely to leave the salmon fishery. 8
9 Given the importance of effort prediction in the development of regulations for this fishery, this kind of information is potentially quite useful to managers. REFERENCES [1] Smith, M. D, 2004, Limited-Entry Licensing: Insights from A Duration Model, American Journal of Agricultural Economics, 86:3, pp [2] Greene, W.H., 1997, Econometric Analysis, Upper Saddle River, NJ: Prentice Hall. [3] Ibrahim, J.G., M.-H. Chen, and D. Sinha, 2001, Bayesian Survival Analysis, New York: Springer. [4] Gilks, W.R., and P. Wild, 1992, Adaptive Rejection Sampling for Gibbs Sampling, Applied Statistics 41: [5] Chib, S. and I. Jeliazkov, 2001, Marginal Likelihood From the Metropolis-Hastings Output, Journal of the American Statistical Association, 96, pp
10
Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach
Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science
Statistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
Markov Chain Monte Carlo Simulation Made Simple
Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical
Nonparametric adaptive age replacement with a one-cycle criterion
Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: [email protected]
Analysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
Parallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
Centre for Central Banking Studies
Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics
Estimation and comparison of multiple change-point models
Journal of Econometrics 86 (1998) 221 241 Estimation and comparison of multiple change-point models Siddhartha Chib* John M. Olin School of Business, Washington University, 1 Brookings Drive, Campus Box
Bayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
BayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
Removal fishing to estimate catch probability: preliminary data analysis
Removal fishing to estimate catch probability: preliminary data analysis Raymond A. Webster Abstract This project examined whether or not removal sampling was a useful technique for estimating catch probability
Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
Gaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
Tutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
More details on the inputs, functionality, and output can be found below.
Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing
Duration Analysis. Econometric Analysis. Dr. Keshab Bhattarai. April 4, 2011. Hull Univ. Business School
Duration Analysis Econometric Analysis Dr. Keshab Bhattarai Hull Univ. Business School April 4, 2011 Dr. Bhattarai (Hull Univ. Business School) Duration April 4, 2011 1 / 27 What is Duration Analysis?
11. Time series and dynamic linear models
11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd
Inference on Phase-type Models via MCMC
Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable
SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS. J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID
SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID Renewable Energy Laboratory Department of Mechanical and Industrial Engineering University of
Handling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University [email protected]
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University [email protected] 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
Lecture 15 Introduction to Survival Analysis
Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15 Background In logistic regression, we were interested in studying how risk factors were associated with presence
Marketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics
Slide 1 An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics Dr. Christian Asseburg Centre for Health Economics Part 1 Slide 2 Talk overview Foundations of Bayesian statistics
From the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
Bayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
7.1 The Hazard and Survival Functions
Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence
Statistical Analysis of Life Insurance Policy Termination and Survivorship
Statistical Analysis of Life Insurance Policy Termination and Survivorship Emiliano A. Valdez, PhD, FSA Michigan State University joint work with J. Vadiveloo and U. Dias Session ES82 (Statistics in Actuarial
Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT
Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
THE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE
THE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE Batsirai Winmore Mazviona 1 Tafadzwa Chiduza 2 ABSTRACT In general insurance, companies need to use data on claims gathered from
BAYESIAN ANALYSIS OF AN AGGREGATE CLAIM MODEL USING VARIOUS LOSS DISTRIBUTIONS. by Claire Dudley
BAYESIAN ANALYSIS OF AN AGGREGATE CLAIM MODEL USING VARIOUS LOSS DISTRIBUTIONS by Claire Dudley A dissertation submitted for the award of the degree of Master of Science in Actuarial Management School
Maximum likelihood estimation of mean reverting processes
Maximum likelihood estimation of mean reverting processes José Carlos García Franco Onward, Inc. [email protected] Abstract Mean reverting processes are frequently used models in real options. For
CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES
Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical
Threshold Autoregressive Models in Finance: A Comparative Approach
University of Wollongong Research Online Applied Statistics Education and Research Collaboration (ASEARC) - Conference Papers Faculty of Informatics 2011 Threshold Autoregressive Models in Finance: A Comparative
Reliability estimators for the components of series and parallel systems: The Weibull model
Reliability estimators for the components of series and parallel systems: The Weibull model Felipe L. Bhering 1, Carlos Alberto de Bragança Pereira 1, Adriano Polpo 2 1 Department of Statistics, University
A Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
Bayesian Hidden Markov Models for Alcoholism Treatment Tria
Bayesian Hidden Markov Models for Alcoholism Treatment Trial Data May 12, 2008 Co-Authors Dylan Small, Statistics Department, UPenn Kevin Lynch, Treatment Research Center, Upenn Steve Maisto, Psychology
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Power and sample size in multilevel modeling
Snijders, Tom A.B. Power and Sample Size in Multilevel Linear Models. In: B.S. Everitt and D.C. Howell (eds.), Encyclopedia of Statistics in Behavioral Science. Volume 3, 1570 1573. Chicester (etc.): Wiley,
Comparison of resampling method applied to censored data
International Journal of Advanced Statistics and Probability, 2 (2) (2014) 48-55 c Science Publishing Corporation www.sciencepubco.com/index.php/ijasp doi: 10.14419/ijasp.v2i2.2291 Research Paper Comparison
A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study
A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study But I will offer a review, with a focus on issues which arise in finance 1 TYPES OF FINANCIAL
Analysis of Financial Time Series
Analysis of Financial Time Series Analysis of Financial Time Series Financial Econometrics RUEY S. TSAY University of Chicago A Wiley-Interscience Publication JOHN WILEY & SONS, INC. This book is printed
Cluster Analysis for Evaluating Trading Strategies 1
CONTRIBUTORS Jeff Bacidore Managing Director, Head of Algorithmic Trading, ITG, Inc. [email protected] +1.212.588.4327 Kathryn Berkow Quantitative Analyst, Algorithmic Trading, ITG, Inc. [email protected]
The Variability of P-Values. Summary
The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 [email protected] August 15, 2009 NC State Statistics Departement Tech Report
Validation of Software for Bayesian Models Using Posterior Quantiles
Validation of Software for Bayesian Models Using Posterior Quantiles Samantha R. COOK, Andrew GELMAN, and Donald B. RUBIN This article presents a simulation-based method designed to establish the computational
Poisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
Introduction to Markov Chain Monte Carlo
Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem
Dirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]
Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance
Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups
Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)
SUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing [email protected]
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing [email protected] IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way
Alessandro Birolini. ineerin. Theory and Practice. Fifth edition. With 140 Figures, 60 Tables, 120 Examples, and 50 Problems.
Alessandro Birolini Re ia i it En ineerin Theory and Practice Fifth edition With 140 Figures, 60 Tables, 120 Examples, and 50 Problems ~ Springer Contents 1 Basic Concepts, Quality and Reliability Assurance
Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I
Index Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1 EduPristine CMA - Part I Page 1 of 11 Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting
Using simulation to calculate the NPV of a project
Using simulation to calculate the NPV of a project Marius Holtan Onward Inc. 5/31/2002 Monte Carlo simulation is fast becoming the technology of choice for evaluating and analyzing assets, be it pure financial
Java Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE
PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE BY P.D. ENGLAND AND R.J. VERRALL ABSTRACT This paper extends the methods introduced in England & Verrall (00), and shows how predictive
The HB. How Bayesian methods have changed the face of marketing research. Summer 2004
The HB How Bayesian methods have changed the face of marketing research. 20 Summer 2004 Reprinted with permission from Marketing Research, Summer 2004, published by the American Marketing Association.
Interpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
A Bayesian hierarchical surrogate outcome model for multiple sclerosis
A Bayesian hierarchical surrogate outcome model for multiple sclerosis 3 rd Annual ASA New Jersey Chapter / Bayer Statistics Workshop David Ohlssen (Novartis), Luca Pozzi and Heinz Schmidli (Novartis)
Model Selection and Claim Frequency for Workers Compensation Insurance
Model Selection and Claim Frequency for Workers Compensation Insurance Jisheng Cui, David Pitt and Guoqi Qian Abstract We consider a set of workers compensation insurance claim data where the aggregate
Detection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
VOLATILITY AND DEVIATION OF DISTRIBUTED SOLAR
VOLATILITY AND DEVIATION OF DISTRIBUTED SOLAR Andrew Goldstein Yale University 68 High Street New Haven, CT 06511 [email protected] Alexander Thornton Shawn Kerrigan Locus Energy 657 Mission St.
The primary goal of this thesis was to understand how the spatial dependence of
5 General discussion 5.1 Introduction The primary goal of this thesis was to understand how the spatial dependence of consumer attitudes can be modeled, what additional benefits the recovering of spatial
Introduction to time series analysis
Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples
CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA
Examples: Mixture Modeling With Longitudinal Data CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Mixture modeling refers to modeling with categorical latent variables that represent subpopulations
CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA
Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or
Forecasting methods applied to engineering management
Forecasting methods applied to engineering management Áron Szász-Gábor Abstract. This paper presents arguments for the usefulness of a simple forecasting application package for sustaining operational
Model-based Synthesis. Tony O Hagan
Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
Probabilistic Methods for Time-Series Analysis
Probabilistic Methods for Time-Series Analysis 2 Contents 1 Analysis of Changepoint Models 1 1.1 Introduction................................ 1 1.1.1 Model and Notation....................... 2 1.1.2 Example:
Model Calibration with Open Source Software: R and Friends. Dr. Heiko Frings Mathematical Risk Consulting
Model with Open Source Software: and Friends Dr. Heiko Frings Mathematical isk Consulting Bern, 01.09.2011 Agenda in a Friends Model with & Friends o o o Overview First instance: An Extreme Value Example
Standard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
Basic Bayesian Methods
6 Basic Bayesian Methods Mark E. Glickman and David A. van Dyk Summary In this chapter, we introduce the basics of Bayesian data analysis. The key ingredients to a Bayesian analysis are the likelihood
Distribution (Weibull) Fitting
Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions
Dealing with large datasets
Dealing with large datasets (by throwing away most of the data) Alan Heavens Institute for Astronomy, University of Edinburgh with Ben Panter, Rob Tweedie, Mark Bastin, Will Hossack, Keith McKellar, Trevor
Time series Forecasting using Holt-Winters Exponential Smoothing
Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract
On Marginal Effects in Semiparametric Censored Regression Models
On Marginal Effects in Semiparametric Censored Regression Models Bo E. Honoré September 3, 2008 Introduction It is often argued that estimation of semiparametric censored regression models such as the
Lecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
From the help desk: Swamy s random-coefficients model
The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients
CHOICES The magazine of food, farm, and resource issues
CHOICES The magazine of food, farm, and resource issues 4th Quarter 2005 20(4) A publication of the American Agricultural Economics Association Logistics, Inventory Control, and Supply Chain Management
CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Applying MCMC Methods to Multi-level Models submitted by William J Browne for the degree of PhD of the University of Bath 1998 COPYRIGHT Attention is drawn tothefactthatcopyright of this thesis rests with
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization
Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization Jean- Damien Villiers ESSEC Business School Master of Sciences in Management Grande Ecole September 2013 1 Non Linear
