Monte Carlo simulation models: Sampling from the joint distribution of State of Nature -parameters
|
|
- Elwin Perry
- 7 years ago
- Views:
Transcription
1 Monte Carlo simulation models: Sampling from the joint distribution of State of Nature -parameters Erik Jørgensen Biometry Research Unit. Danish Institute of Agricultural Sciences, P.O.Box 50, DK-8830 Tjele, Denmark Abstract When using Monte Carlo simulation models for decision support it is important to represent the full uncertainty that faces the decision maker. This paper focuses on approaches towards specifying the uncertainty in input parameters, also called "State-of-nature". Until recently, such specification has only been possible in practice under special conditions, e.g., independent parameters or parameters following specific multivariate distributions, such as the normal distribution. As a result of advances in Bayesian statistical methodology, it is now possible to specify much more complicated distributions, and the distributions can even be found conditional on observations made prior to the simulations. The paper presents cases to illustrate the potential. 1 Introduction Models of animal production systems is widely used for investigating different production strategies etc. Many of these models uses Monte Carlo simulation techniques to calculate output variables of interest. As a result of the large complexity of such models, even the correct specification of model input parameters leads to difficulties. As a result, the uncertainty in the input parameters is usually ignored. When the model is used for studying the behaviour of the system, this may not be important. However, when the model is used for decision support, the full uncertainty facing the decision maker needs to be considered. In the so-called Dina pig model (Jørgensen & Kristensen, 1995) the intention is to include the full uncertainty in the model, i.e., the uncertainty in model parameters is included as well as the usual uncertainty in system development with known parameters. Erik.Jorgensen@agrsci.dk, WWW: 1
2 With offset in the Dina pig model, this paper illustrates approaches toward correct specification of input parameters. 1.1 Elements of Simulation models Initially, we will present some elements of the Monte Carlo simulation method, mainly to introduce the notations followed in the paper. Details of the methods can be found in textbooks such as Fishmann (1996). Essentially, the Monte Carlo simulation method is a method for evaluating an integral Ψ = E π {U(X)} = U(x)π(x)dx (1) where E π () is the expectation with respect to the probability density π and U() is some response function, e.g., a utility function. It involves generating random draws X = x (j) from the target distribution π and then estimating Ψ by Ψ = 1 k { U ( x (1) ) + + U ( x (k))} (2) In our context, X = {Θ, Φ} is a vector consisting of decision parameters, Θ, and system parameters and state variables, Φ. The Monte Carlo method is thus a numeric method for evaluating the integral in Eq. (1). In addition, if the random draws, x (j) are independent, we can easily obtain an estimate of the error of the approximation, using the Central Limit Theorem, (see e.g., Fishmann, 1996, section 2.2). Often, it is an advantage to reformulate the integral in Eq. (1) by splitting Φ into the so-called state of nature, Φ O, and parameters and state variables Φ s = {Φ 1s, Φ 2s,, Φ T s } that are calculated by the model. (The additional index denotes model step, e.g., model time). A subset of Φ s, Ω is called the output of the model. This splitting of the parameter vector leads to a reformulation of Eq. (1) } { } π(x) Ψ = E πo {E πs O {U(X)} = U(x) d{θ, Φ s} π O (Φ O )dφ O (3) π O (Φ O ) where E πs O {U(X)} denotes the conditional expectation of U(X) for a given state of nature Φ O. The dimension of Φ O will in general be fixed by the model structure, while the number of elements in Φ s will vary with different decisions and different combinations of the other elements in Φ. Disregarding the problem of dimensionality, the integration with respect to Φ O is well behaved and lends itself to techniques other than simple Monte Carlo simulation. In contrast, the integration with respect to Φ s is of a complexity that is only feasible to solve using the Monte Carlo method. (Note, that the dimension of Φ O in such models is often in excess of hundred, so even though it is well behaved the evaluation of the integral is complicated). 2
3 1.2 Additional information concerning State-of-Nature Often, we want to use the the model in a specific context, e.g., to predict effect of different production strategies within a specific herd. In this case, we have additional information concerning the model parameters, i.e., registrations related to the model parameters y. In this case, we are interested to base our inference on the conditional distribution of the parameters given the observations, π 0 (Φ 0 y). Note, that this implies that we additionally specify a model of the joint distribution of the parameters Φ 0 and the observations, y. But this is exactly the purpose of our simulation model, i.e., the output parameters Ω is usually observable and the observations y is a subset of Ω. In Jørgensen (2000a) calibrating of model parameters with observations of model output parameters is described. However, du e to complexity issues this approach has limitations. In the present context, we will therefore concentrate on the situation, where we are able to specify an alternative model of the relation between the observations and the model parameters π(ω Y = y, Φ 0 ) = π(ω Φ 0 )π(φ 0 Y = y) though there is an inherent inconsistency in the approach as y Ω. Of course we may argue that y is independent of the features, we explore in the model, i.e., decisions and capacity restrictions. The problem handled in the present paper is how to specify the joint probability distribution of the parameters π O (Φ O ), in order to make it possible to draw pseudo-random instantiations Φ (i) 0 of the distribution. The presentation is structured as a description of three cases. The approach described is used in the Dina pig simulation model (Jørgensen & Kristensen, 1995), and may be combined with recent advances within Bayesian statistics 1.3 The framework for specification The specification of the prior distribution is similar to the specification need within Bayesian approaches to statistical analysis and learning in expert systems (Spiegelhalter et al., 1993, 1996). One widely used program is the so-called WinBUGS program (Spiegelhalter et al., 1999). The WinBUGS program is intended for inference in graphical models using the Markov Chain Monte Carlo approach. The original intention in the Dina pig model was to use the WinBUGS language for the specification. However, in most cases the use of WinBUGS would be too inconvenient. Under assumptions of independency between parameters, the graphical model is simply a set of disconnected nodes. Therefore, the model specification in the Dina pig model follows the specification language in WinBUGS, but is integrated into the general model specification. 3
4 2 Case I: Independent parameters When specifying a probability distribution for the parameters in the state of nature, a simplifying assumption is that the the parameters are independent. In the Dina pig model this is the standard assumption. The independence assumption implies that the joint density of the parameters is simply the product of the density of each individual parameter i.e., π(φ O ) = π(φ 0,1 ) π(φ 0,2 ) π(φ 0,n ) That is for each state parameter, Φ 0,i, in the model, instead of only specifying the expected value, we have to select a probability distribution and the parameters describing this distribution. We will follow standard practice and use the term hyper-parameters. Very often it is most natural to specify the distribution of the parameter on a different scale than the actual parameter. Parameters describing proportions may be specified on a logit scale and a log-normal distribution may be natural for some parameters. As an example, parameter values describing time until an event (i.e., positive) may often be described as following a lognormal distribution. Therefore, the normal distribution is selected as the distribution with corresponding hyperparameters and the transformation is the exponential function exp(). The available distributions and transformation closely follows the notation in the WinBUGS manual. 2.1 Specification of growth related parameters One of the available growth models in the Dina pig model, is an extension of a simple Gompertz growth model, as described in Jørgensen (1998). We will use this model as an example of the specification of the prior distribution. The Gompertz growth model in its standard format is dw dt = k {K ln(w t )} W t (4) where W t is the weight at time t, k is the growth rate and K is the logarithm of asymptotic maximum weight. This produces a sigmoid curve that closely corresponds to the growth of the pig. Notice, that the description of K should not be taken literally. Extrapolation from measurements during the slaughter pig growth phase to the age, when maximum weight is approached, is not reliable. The basic formula in Eq. (4) is modified in the simulation model, but the basic formula may still be recognised. A growth parameter called the current herd level at time t, k ht follows a first order stationary autoregressive process with k h(t+ t) = µ kh + α( t )(k ht µ kh ) + β( t )ε h (5) µ kh is the expected level (e.g. the population expectation) and ε h is a random noise, where ε h N (0, σ 2 h ) α( t) = exp( α 0 t ) and β( t ) = 1 α( t ) 2 is the autoregression parameters 4
5 with the varying length of the time steps taken into account. 1 α 0 corresponds approximately to the usual autoregression parameter with time step 1. The individual growth parameter for each pig is k pig. k pig is drawn from a normal distribution with expectation equal to the herd level at the time of the pig s introduction into the herd, i.e., k pig N (k ht, σk 2 ) with t the time of introduction into the herd of the pig. The specification the herd level of growth rate k h will be based on estimates of daily gain from production data bases. Usual values is that the herd level in daily gain varies between 700 and 1000 gram, roughly speaking a standard deviation of 300/4 75 g. As the daily gain is a function of the k parameter as well as the K parameter we use a first order Taylor approximation i.e., dg f(k 0, K 0 ) + f k (k 0, K 0 )(k k 0 ) + f K (k 0, K 0 )(K K 0 ) as basis for an approximate variance V(dg) (f k )2 (k 0, K 0 )V(k) + (f K )2 V(K) Furthermore, we assume that 90% of the variation is due to variation in k h. From these assumptions we find that K N (5.40, 1/ ), and k h N (0.0116, 1/ ) (The normal distribution is parameterised with mean and precision (= 1/σ 2 ) following WinBUGS). With the mean parameter values the average daily gain is 885 based on the growth of a single animal from 77 to 175 days. α 0 is selected to obtain a correlation between herd level 3 months apart of between 0.95 and 0.99, i.e., N (0.0003, 1/ ) The variation on herd level σ h is specified to reflect that the variance within the herd is assumed to be between 0.25 to 0.5 of the total variance between herds. A lognormal scale is assumed i.e., log(σh 2 ) N ( 6.6, ). Variation between pigs consists of a genetic part and a random walk part. The specification is based on the assumption, that after 90 days of growth the width of confidence interval for live weight is 30 kg, corresponding to a standard deviation of 30/4 Between 1/3 to 2/3 of the variance is permanent corresponding to a σ k between [ , ]). Therefore we select the following distribution for σ k N ( , ). The random walk part corresponds to an additional standard deviation in daily gain uniformly distributed between [0.25, 0.75]. Similar considerations is made for the specification of the other model parameters, i.e., parameters describing start weight, feed intake, feed waste, slaughter waste (killing out percentage), and relation ship between live weight and meat percentage. The available space does not allow us to present the data. Using the Dina pig model the kernel density plots shown in Fig. 1 is produced for each input parameter. k Std.k Std.dailygain Figure 1: Prior distribution of variables. The rug indicates the values used in actual simulation runs 5
6 3 Case 2: Using samples from Markov Chain Monte Carlo The second case is taken from a study concerning precision of clinical diagnosis, Bådsgaard & Jørgensen (2000). The results will be used in section 4 as well. The case has been selected because it illustrates a situation that is almost standard, when using simulation models for decision support. A hierarchic model is used for describing a population of subjects based on empirical data. When using the simulation model, we want to refer to a subject (e.g., a herd) from this population, either with no further information on the subject or with some additional information (e.g., previous performance) on the subject. In this case, the prior distribution is estimated using the Markov Chain Monte Carlo approach via WinBUGS. For clinical diseases, estimation of herd prevalence relies on how precise the veterinarian is. The precision is usually expressed as sensitivity, SE, the probability of correct identification of a diseased animal, and specificity, SP, the probability of correct identification of the healthy animal. SE and SP influences the observed prevalence in the herd. Consider the case, where a veterinarian inspects 10 animals. We want to estimate the probability of observing n obs diseased animals, conditional on the true prevalence in the herd p dis, i.e., Pr(n obs = i p dis = p) = Pr(n obs = i p dis = p, SE = u, SP = v)π(u, v)dudv (6) where π(u, v) is the joint probability density of sensitivity and specificity of the veterinarian. In this context, the simulation model is very simple, i.e., with known parameters the observed number of diseased animals simply follows the binomial distribution. However, the parameters is not known. Our knowledge concerning the specificity and sensitivity of the veterinarian may either arise from the specific knowledge of the vet based on previous observations, or from our general knowledge concerning the population of veterinarians. In Fig. 2 this is illustrated. In the study (Bådsgaard & Jørgensen, 2000) the distribution (Vet pop ) of the precision parameters (Vet i = {SE, SP}), were quantified using an experimental setup where 4 veterinarians (only two in the figure) simultaneously assessed clinical symptoms of a total of 155 animals. In the present context, we want to use the information from this study for estimation of π(u, v) in Eq. (6). Two situation will be addressed, either if a specific veterinarian participating in quantification study (Vet 2 ) or a different veterinarian selected at random from the population (Vet 3 ). 6
7 Quantification Study Vet pop "Simulation" model Vet 1 Vet 2 Vet 3 Symp i1 Symp i2 Symp i5 Symp i3 State i State i State i Herd 0 Herd 1 Herd 2 Herd pop Figure 2: Schematic illustration of the clinical setup and our study. The WinBUGS analysis produces a sample {(SE (1), SP (1) ), (SE (2), SP (2) ),..., (SE (n), SP (n) )} from the relevant distributions as illustrated by the kernel densities in Fig. 3 (n is the sample size). Note that we may be able to approximate the joint distribution using some standard probability distribution such as the multivariate normal distribution. However, it is not obvious to what extent the parameters will follow such a distribution. Furthermore, the efforts will be a waste of time. For our purpose, we need exactly what the MCMC approach produces, a sample from the correct joint distribution. In the present case a sample of were produced. To estimate the probability in Eq. (6) we proceed as follows for a given true prevalence p h. First the probability of observing disease symptoms is calculated p (i) o = SE (i) p h + (1 p h )(1 SP (i) ) Then number of animals with disease symptoms n (i) o i.e., n (i) o desired probabilities. Binomial(10, p (i) o ). Finally, the distribution of n (i) o is drawn from the binomial distribution for all i is used form finding the In the present case the "simulation" model is so simple that calculation time and sample size is of (almost) no concern. However, with more complicated simulation models this issue becomes important. In contrast to the previous case, the samples produced by the MCMC approach are not independent. Therefore, the precision of the output by the simulation model is not simply ˆσ/ n 7
8 Density a) Density b) logit(se) logit(sp) Figure 3: Kernel density estimates on logit scale of sensitivity (a) and specificity (b) for random veterinarian ( ) and veterinarian no. 1 ( ). Table 1: Distribution of number of diseased from clinical inspection with different herd health state. Random veterinarian. Herd Number of diseased (clinical) Health > Table 2: Distribution of number of diseased from clinical inspection with different herd health state. Vet. no. 1 from experiment Herd Number of diseased (clinical) Health >
9 3.1 Conclusion The Markov Chain Monte Carlo methods seems ideally suited to be used in the context of specification of prior distribution for use in simulation modelling. Even if standard statistical model such as generalized linear models may be more expedient for experimental analysis, MCMC may be still relevant because a random sample from the population is automatically produces. The only problem is that the samples are not drawn independently, but to a large extent this can be remedied by thinning the sample. 4 Case 3: Sampling from Expert system (Bayesian network) The third case is taken from a project concerning intervention strategies for respiratory diseases, as presented in Otto (2000). The system uses the uses HUGIN TM program to formulate a probabilistic expert system for diagnosis and error detection concerning Mycoplasma. The present example is a slight modification of an example described in detail in Jørgensen (2000b). The final system will in addition to the diagnostic network include a module for Monte Carlo assessment of cost-benefit of different controlstrategies. The prevalence of the disease is expected to depend on management level and two risk factors. The prevalence may be observed either by the farmer or by a veterinarian. The precision of the farmers observation depends on his ability as a manager. Disease prevalence and quality of management influences growth rate. Manage Risk 1 Risk 2 Gain Preval Farm obs VetFind Figure 4: Hugin Expert system The quantifications of the dependencies in Fig. 4 is based upon Stärk et al. (1998). Two of the risk factors in her table 10 has been selected with corresponding parameter estimates. Two additions has been made. The overdispersion has been modelled by a random herd effect, and an additional management factor not included in her study is added for illustration. The 9
10 parameters from the logistic regression in Stärk et al. (1998) has thus been supplement with effect of management and between herd variation. Three factors influence herd prevalence. The management quality (Manage), Manure removal in nursery (Risk1) and No. of pigs in room (nursery) (Risk2) is No. of pigs in room (nursery). Each factor is categorized into discrete levels. The detailed model parameters are described in Jørgensen (2000b). The parameters is used to specify the necessary probability distribution tables in the Bayesian network in Fig. 4. The Prevalence node is defined as a continuous variable, the prevalence of serologically positive animals. For the purpose of the model the prevalence node is divided into 5 categories No disease, from 1-10 percent disease, from 10 to 40 percent disease, from 40 to 60 percent disease and above 60 percent disease. Based on the model we can calculate the probability of being in each of these different categories of disease level for each combination of the parent nodes (risk factors). To illustrate, the probability distribution is shown for a selected part of the combinations of risk factors In Table 3. Table 3: Distribution of herd health level for average management and selected risk factors. No. Of pigs 1st quartile 2nd quartile Manure Removal < daily daily > daily < daily daily > daily Herd Health > The next step in the modelling is the specification of the problem detection by the farmer. In the present example Farm_obs is defined with two levels No problem observed and Problem observed. In his daily work, the farmer assess the disease level continuously, but the measurement is not necessarily very precise. Furthermore, the observations may not lead to a problem detection, because the farmer may suppose that he is looking at a normal disease level, i.e., is threshold for problem detection is high. A natural model of the farmers observation is that good farmers are more precise in their observation, and that they tend to react to lower levels of disease. Based on these assumptions the probability table of Farm_obs conditioned on management quality and health problem is specified. The next node is the veterinarian diagnosis, i.e., he visits the farm and samples 10 animals at random and makes a clinical inspection of the animal. The outcome of the clinical inspection is the number of diseased animals, i.e., the states of the Vet_Find1 node is {0,..., 10}. If we know the herd prevalence and the precision of the veterinarian, we can calculate the probability distribution of number of diseased based on the assumptions above. This is exactly the probability table that were specified in section 3 and Table 1 and 2. Of course, the table need to reflect our knowledge concerning the veterinarian. 10
11 In the typical use of the expert system, we need to base our inference on evidence on a minimum of two nodes. The farmer will have detected a problem in the herd, and we will have the the result of the veterinarians inspection of the sample. Conditional on this evidence we need to advice the farmer, if he should change his production strategy and how it should be changed. Our expectation towards future production strategies will of course depend on the risk factors actually causing the problem. A high stocking rate might suggest an increase of herd size combined with sectioned production. But if there is poor management as well the full benefit of sectioned production might not be obtained. The cost-benefit of control-strategies thus depends on the combination of risk-factors presents. The Bayesian network contains the full joint probability of these combinations, and in the program Hugin a random sample from this joint distribution may readily be found, using existing procedures in the application programming interface (simulate). In contrast to the output from the MCMC method, the subsequent samples from Hugin are independent samples from the distribution. 5 Conclusion In the present paper different approaches towards specification of prior distribution of "stateof-nature" parameters has been presented. The conclusions is that such a specification can be made readily using off-the-shelf methods, and the possibility for handling prior evidence concerning these parameters are good. The only word of caution, is that the sample produced by the MCMC does not consist of independent instantiations from the distribution, but this can be easily remedied by simply discarding instantiations. However, it should be noted that the techniques are restricted to relationships between model parameters and evidence, where it is not important to use the full simulation model. This relates especially to capacity restrictions and interactions between animals. The ideas in Jørgensen (2000a) may be used in such cases. Another important aspect not covered, is the state of individual animals currently present in the herd. It is not possible to estimate the current state of an individual in the herd without taken observations and decisions into account. The individuals remain in the herd because it has been decided not to cull it. We need techniques to calculate probability distribution of the current state given the evidence that it is alive. If the use of simulation models is restricted to steady state results of production strategies, we can avoid this problem. 11
12 References Bådsgaard, N.P. & E. Jørgensen (2000). A Bayesian approach to estimating the reliability of clinical observations With an application to herd prevalence estimation. Preventive Veterinary Medicine, in prep. Fishmann, G.S. (1996). Monte Carlo. Concepts, Algorithms, and Applications. Springer-Verlag New York, Inc. Jørgensen, E. (1998). Stochastic modelling of pig production. Working Paper: Growth Models. Dina Notat, 73 pp URL: eps. Jørgensen, E. (2000a). Calibration of a Monte Carlo Simulation Model of Disease Spread in Slaughter Pig Units. Computers and Electronics in Agriculture, 25 pp URL: Jørgensen, E. (2000b). Elements of Bayesian network specification in an animal health research project. Internal report, Biometry Research Unit, Danish Institute of Agricultural Sciences, pp URL: diag1504a.pdf. Jørgensen, E. & A.R. Kristensen (1995). An object oriented simulation model of a pig herd with emphasis on information flow. In FACTs 95 March 7, 8, 9, 1995, Orlando Florida, Farm Animal Computer Technologies Conference, pp Otto, L. (2000). Mycoplasma for pigs in a Bayesian Network: A decision support system. In Proc. "Economic modelling of Animal Health and Farm Management". November 23-24, 2000 Wageningen. Spiegelhalter, D.J., A.P. Dawid, S.L. Lauritzen, & R.G. Cowell (1993). Bayesian Analysis in Expert Systems. Statistical Science, 8(3) pp Spiegelhalter, D.J., A. Thomas, & N. Best (1996). Computation on Bayesian Graphical Models. Bayesian Statistics, 5 pp Spiegelhalter, D.J., A. Thomas, N. Best, & W. Gilks (1999). WinBUGS. Version 1.2 User Manual. MRC Biostatistics Unit. URL: uk/bugs/welcome.html. Stärk, K.D.C., D.U. Pfeiffer, & R.S. Morris (1998). Risk factors for respiratory disease in New Zealand pig herds. New Zealand Veterinary Journal, pp
Herd Management Science
Herd Management Science Preliminary edition Compiled for the Advanced Herd Management course KVL, August 28th - November 3rd, 2006 Anders Ringgaard Kristensen Erik Jørgensen Nils Toft The Royal Veterinary
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationA Bayesian hierarchical surrogate outcome model for multiple sclerosis
A Bayesian hierarchical surrogate outcome model for multiple sclerosis 3 rd Annual ASA New Jersey Chapter / Bayer Statistics Workshop David Ohlssen (Novartis), Luca Pozzi and Heinz Schmidli (Novartis)
More informationModel-based Synthesis. Tony O Hagan
Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More informationModeling and Analysis of Call Center Arrival Data: A Bayesian Approach
Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationApplications of R Software in Bayesian Data Analysis
Article International Journal of Information Science and System, 2012, 1(1): 7-23 International Journal of Information Science and System Journal homepage: www.modernscientificpress.com/journals/ijinfosci.aspx
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationSENSITIVITY ANALYSIS AND INFERENCE. Lecture 12
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationUsing SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models
Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models Clement A Stone Abstract Interest in estimating item response theory (IRT) models using Bayesian methods has grown tremendously
More informationMonte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)
Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February
More informationHow To Understand The Theory Of Probability
Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationBootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
More informationParallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationMTH 140 Statistics Videos
MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative
More informationIEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem
IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem Time on my hands: Coin tosses. Problem Formulation: Suppose that I have
More informationMonte Carlo-based statistical methods (MASM11/FMS091)
Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlo-based
More informationChapter 4. Probability and Probability Distributions
Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationForecast covariances in the linear multiregression dynamic model.
Forecast covariances in the linear multiregression dynamic model. Catriona M Queen, Ben J Wright and Casper J Albers The Open University, Milton Keynes, MK7 6AA, UK February 28, 2007 Abstract The linear
More informationStatistical Rules of Thumb
Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN
More informationMonte Carlo Methods in Finance
Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook October 2, 2012 Outline Introduction 1 Introduction
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationPREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE
PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE BY P.D. ENGLAND AND R.J. VERRALL ABSTRACT This paper extends the methods introduced in England & Verrall (00), and shows how predictive
More information11. Time series and dynamic linear models
11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationSupplement to Call Centers with Delay Information: Models and Insights
Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290
More informationMaximum likelihood estimation of mean reverting processes
Maximum likelihood estimation of mean reverting processes José Carlos García Franco Onward, Inc. jcpollo@onwardinc.com Abstract Mean reverting processes are frequently used models in real options. For
More informationQuantitative Inventory Uncertainty
Quantitative Inventory Uncertainty It is a requirement in the Product Standard and a recommendation in the Value Chain (Scope 3) Standard that companies perform and report qualitative uncertainty. This
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationHandling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
More informationEquity-Based Insurance Guarantees Conference November 1-2, 2010. New York, NY. Operational Risks
Equity-Based Insurance Guarantees Conference November -, 00 New York, NY Operational Risks Peter Phillips Operational Risk Associated with Running a VA Hedging Program Annuity Solutions Group Aon Benfield
More informationTime series analysis as a framework for the characterization of waterborne disease outbreaks
Interdisciplinary Perspectives on Drinking Water Risk Assessment and Management (Proceedings of the Santiago (Chile) Symposium, September 1998). IAHS Publ. no. 260, 2000. 127 Time series analysis as a
More informationTutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
More informationDECISION MAKING UNDER UNCERTAINTY:
DECISION MAKING UNDER UNCERTAINTY: Models and Choices Charles A. Holloway Stanford University TECHNISCHE HOCHSCHULE DARMSTADT Fachbereich 1 Gesamtbibliothek Betrtebswirtscrtaftslehre tnventar-nr. :...2>2&,...S'.?S7.
More informationCHAPTER 3 CALL CENTER QUEUING MODEL WITH LOGNORMAL SERVICE TIME DISTRIBUTION
31 CHAPTER 3 CALL CENTER QUEUING MODEL WITH LOGNORMAL SERVICE TIME DISTRIBUTION 3.1 INTRODUCTION In this chapter, construction of queuing model with non-exponential service time distribution, performance
More informationMarketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
More informationFinancial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2
Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2 Due Date: Friday, March 11 at 5:00 PM This homework has 170 points plus 20 bonus points available but, as always, homeworks are graded
More informationMore details on the inputs, functionality, and output can be found below.
Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing
More informationAn analysis of price impact function in order-driven markets
Available online at www.sciencedirect.com Physica A 324 (2003) 146 151 www.elsevier.com/locate/physa An analysis of price impact function in order-driven markets G. Iori a;, M.G. Daniels b, J.D. Farmer
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationSimilarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle
Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases Andreas Züfle Geo Spatial Data Huge flood of geo spatial data Modern technology New user mentality Great research potential
More informationOne-year reserve risk including a tail factor : closed formula and bootstrap approaches
One-year reserve risk including a tail factor : closed formula and bootstrap approaches Alexandre Boumezoued R&D Consultant Milliman Paris alexandre.boumezoued@milliman.com Yoboua Angoua Non-Life Consultant
More informationProblem of Missing Data
VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;
More informationAnother Look at Sensitivity of Bayesian Networks to Imprecise Probabilities
Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities Oscar Kipersztok Mathematics and Computing Technology Phantom Works, The Boeing Company P.O.Box 3707, MC: 7L-44 Seattle, WA 98124
More informationThe Intelligent Pig Barn. Anders Ringgaard Kristensen University of Copenhagen
The Intelligent Pig Barn Anders Ringgaard Kristensen University of Copenhagen Who am I? Anders Ringgaard Kristensen: Born 1958 Grew up on a farm in Western Jutland Degrees: 1982: Animal Scientist 1985:
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
More informationA spreadsheet Approach to Business Quantitative Methods
A spreadsheet Approach to Business Quantitative Methods by John Flaherty Ric Lombardo Paul Morgan Basil desilva David Wilson with contributions by: William McCluskey Richard Borst Lloyd Williams Hugh Williams
More informationPrinciple of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
More informationBayesian Statistical Analysis in Medical Research
Bayesian Statistical Analysis in Medical Research David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz draper@ams.ucsc.edu www.ams.ucsc.edu/ draper ROLE Steering
More informationConfidence Intervals for One Standard Deviation Using Standard Deviation
Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More informationOverview of Monte Carlo Simulation, Probability Review and Introduction to Matlab
Monte Carlo Simulation: IEOR E4703 Fall 2004 c 2004 by Martin Haugh Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab 1 Overview of Monte Carlo Simulation 1.1 Why use simulation?
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationUncertainty of Power Production Predictions of Stationary Wind Farm Models
Uncertainty of Power Production Predictions of Stationary Wind Farm Models Juan P. Murcia, PhD. Student, Department of Wind Energy, Technical University of Denmark Pierre E. Réthoré, Senior Researcher,
More informationImputing Values to Missing Data
Imputing Values to Missing Data In federated data, between 30%-70% of the data points will have at least one missing attribute - data wastage if we ignore all records with a missing value Remaining data
More informationConfidence Intervals for Spearman s Rank Correlation
Chapter 808 Confidence Intervals for Spearman s Rank Correlation Introduction This routine calculates the sample size needed to obtain a specified width of Spearman s rank correlation coefficient confidence
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationStatistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives
More informationBetter decision making under uncertain conditions using Monte Carlo Simulation
IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics
More informationGLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,
Computing: an indispensable tool or an insurmountable hurdle? Iain Currie Heriot Watt University, Scotland ATRC, University College Dublin July 2006 Plan of talk General remarks The professional syllabus
More informationTEACHING SIMULATION WITH SPREADSHEETS
TEACHING SIMULATION WITH SPREADSHEETS Jelena Pecherska and Yuri Merkuryev Deptartment of Modelling and Simulation Riga Technical University 1, Kalku Street, LV-1658 Riga, Latvia E-mail: merkur@itl.rtu.lv,
More informationSTOCHASTIC MODELLING OF WATER DEMAND USING A SHORT-TERM PATTERN-BASED FORECASTING APPROACH
STOCHASTIC MODELLING OF WATER DEMAND USING A SHORT-TERM PATTERN-BASED FORECASTING APPROACH Ir. LAM Shing Tim Development(2) Division, Development Branch, Water Supplies Department. Abstract: Water demand
More informationA Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data
A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationExploratory Data Analysis
Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
More informationNumerical Methods for Option Pricing
Chapter 9 Numerical Methods for Option Pricing Equation (8.26) provides a way to evaluate option prices. For some simple options, such as the European call and put options, one can integrate (8.26) directly
More informationThe VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.
Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationFrom the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
More informationIntroduction to Markov Chain Monte Carlo
Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem
More informationConfidence Intervals for Cp
Chapter 296 Confidence Intervals for Cp Introduction This routine calculates the sample size needed to obtain a specified width of a Cp confidence interval at a stated confidence level. Cp is a process
More informationThe problem with waiting time
The problem with waiting time Why the only way to real optimization of any process requires discrete event simulation Bill Nordgren, MS CIM, FlexSim Software Products Over the years there have been many
More informationProbability and Statistics
Probability and Statistics Syllabus for the TEMPUS SEE PhD Course (Podgorica, April 4 29, 2011) Franz Kappel 1 Institute for Mathematics and Scientific Computing University of Graz Žaneta Popeska 2 Faculty
More informationBayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
More informationOptimal Stopping in Software Testing
Optimal Stopping in Software Testing Nilgun Morali, 1 Refik Soyer 2 1 Department of Statistics, Dokuz Eylal Universitesi, Turkey 2 Department of Management Science, The George Washington University, 2115
More informationSAS Certificate Applied Statistics and SAS Programming
SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and
More informationAn Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment
An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment Hideki Asoh 1, Masanori Shiro 1 Shotaro Akaho 1, Toshihiro Kamishima 1, Koiti Hasida 1, Eiji Aramaki 2, and Takahide
More informationModel Calibration with Open Source Software: R and Friends. Dr. Heiko Frings Mathematical Risk Consulting
Model with Open Source Software: and Friends Dr. Heiko Frings Mathematical isk Consulting Bern, 01.09.2011 Agenda in a Friends Model with & Friends o o o Overview First instance: An Extreme Value Example
More informationAdequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
More informationMaster s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationCHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
More informationAn Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics
Slide 1 An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics Dr. Christian Asseburg Centre for Health Economics Part 1 Slide 2 Talk overview Foundations of Bayesian statistics
More informationAnalysis of Financial Time Series
Analysis of Financial Time Series Analysis of Financial Time Series Financial Econometrics RUEY S. TSAY University of Chicago A Wiley-Interscience Publication JOHN WILEY & SONS, INC. This book is printed
More informationChapter 14 Managing Operational Risks with Bayesian Networks
Chapter 14 Managing Operational Risks with Bayesian Networks Carol Alexander This chapter introduces Bayesian belief and decision networks as quantitative management tools for operational risks. Bayesian
More informationPrentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009
Content Area: Mathematics Grade Level Expectations: High School Standard: Number Sense, Properties, and Operations Understand the structure and properties of our number system. At their most basic level
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationMachine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
More informationPractical Applications of Stochastic Modeling for Disability Insurance
Practical Applications of Stochastic Modeling for Disability Insurance Society of Actuaries Session 8, Spring Health Meeting Seattle, WA, June 007 Practical Applications of Stochastic Modeling for Disability
More informationA Bayesian Antidote Against Strategy Sprawl
A Bayesian Antidote Against Strategy Sprawl Benjamin Scheibehenne (benjamin.scheibehenne@unibas.ch) University of Basel, Missionsstrasse 62a 4055 Basel, Switzerland & Jörg Rieskamp (joerg.rieskamp@unibas.ch)
More informationPr(X = x) = f(x) = λe λx
Old Business - variance/std. dev. of binomial distribution - mid-term (day, policies) - class strategies (problems, etc.) - exponential distributions New Business - Central Limit Theorem, standard error
More informationInequality, Mobility and Income Distribution Comparisons
Fiscal Studies (1997) vol. 18, no. 3, pp. 93 30 Inequality, Mobility and Income Distribution Comparisons JOHN CREEDY * Abstract his paper examines the relationship between the cross-sectional and lifetime
More informationThe Performance of Option Trading Software Agents: Initial Results
The Performance of Option Trading Software Agents: Initial Results Omar Baqueiro, Wiebe van der Hoek, and Peter McBurney Department of Computer Science, University of Liverpool, Liverpool, UK {omar, wiebe,
More information