Monte Carlo simulation models: Sampling from the joint distribution of State of Nature -parameters

Size: px
Start display at page:

Download "Monte Carlo simulation models: Sampling from the joint distribution of State of Nature -parameters"

Transcription

1 Monte Carlo simulation models: Sampling from the joint distribution of State of Nature -parameters Erik Jørgensen Biometry Research Unit. Danish Institute of Agricultural Sciences, P.O.Box 50, DK-8830 Tjele, Denmark Abstract When using Monte Carlo simulation models for decision support it is important to represent the full uncertainty that faces the decision maker. This paper focuses on approaches towards specifying the uncertainty in input parameters, also called "State-of-nature". Until recently, such specification has only been possible in practice under special conditions, e.g., independent parameters or parameters following specific multivariate distributions, such as the normal distribution. As a result of advances in Bayesian statistical methodology, it is now possible to specify much more complicated distributions, and the distributions can even be found conditional on observations made prior to the simulations. The paper presents cases to illustrate the potential. 1 Introduction Models of animal production systems is widely used for investigating different production strategies etc. Many of these models uses Monte Carlo simulation techniques to calculate output variables of interest. As a result of the large complexity of such models, even the correct specification of model input parameters leads to difficulties. As a result, the uncertainty in the input parameters is usually ignored. When the model is used for studying the behaviour of the system, this may not be important. However, when the model is used for decision support, the full uncertainty facing the decision maker needs to be considered. In the so-called Dina pig model (Jørgensen & Kristensen, 1995) the intention is to include the full uncertainty in the model, i.e., the uncertainty in model parameters is included as well as the usual uncertainty in system development with known parameters. Erik.Jorgensen@agrsci.dk, WWW: 1

2 With offset in the Dina pig model, this paper illustrates approaches toward correct specification of input parameters. 1.1 Elements of Simulation models Initially, we will present some elements of the Monte Carlo simulation method, mainly to introduce the notations followed in the paper. Details of the methods can be found in textbooks such as Fishmann (1996). Essentially, the Monte Carlo simulation method is a method for evaluating an integral Ψ = E π {U(X)} = U(x)π(x)dx (1) where E π () is the expectation with respect to the probability density π and U() is some response function, e.g., a utility function. It involves generating random draws X = x (j) from the target distribution π and then estimating Ψ by Ψ = 1 k { U ( x (1) ) + + U ( x (k))} (2) In our context, X = {Θ, Φ} is a vector consisting of decision parameters, Θ, and system parameters and state variables, Φ. The Monte Carlo method is thus a numeric method for evaluating the integral in Eq. (1). In addition, if the random draws, x (j) are independent, we can easily obtain an estimate of the error of the approximation, using the Central Limit Theorem, (see e.g., Fishmann, 1996, section 2.2). Often, it is an advantage to reformulate the integral in Eq. (1) by splitting Φ into the so-called state of nature, Φ O, and parameters and state variables Φ s = {Φ 1s, Φ 2s,, Φ T s } that are calculated by the model. (The additional index denotes model step, e.g., model time). A subset of Φ s, Ω is called the output of the model. This splitting of the parameter vector leads to a reformulation of Eq. (1) } { } π(x) Ψ = E πo {E πs O {U(X)} = U(x) d{θ, Φ s} π O (Φ O )dφ O (3) π O (Φ O ) where E πs O {U(X)} denotes the conditional expectation of U(X) for a given state of nature Φ O. The dimension of Φ O will in general be fixed by the model structure, while the number of elements in Φ s will vary with different decisions and different combinations of the other elements in Φ. Disregarding the problem of dimensionality, the integration with respect to Φ O is well behaved and lends itself to techniques other than simple Monte Carlo simulation. In contrast, the integration with respect to Φ s is of a complexity that is only feasible to solve using the Monte Carlo method. (Note, that the dimension of Φ O in such models is often in excess of hundred, so even though it is well behaved the evaluation of the integral is complicated). 2

3 1.2 Additional information concerning State-of-Nature Often, we want to use the the model in a specific context, e.g., to predict effect of different production strategies within a specific herd. In this case, we have additional information concerning the model parameters, i.e., registrations related to the model parameters y. In this case, we are interested to base our inference on the conditional distribution of the parameters given the observations, π 0 (Φ 0 y). Note, that this implies that we additionally specify a model of the joint distribution of the parameters Φ 0 and the observations, y. But this is exactly the purpose of our simulation model, i.e., the output parameters Ω is usually observable and the observations y is a subset of Ω. In Jørgensen (2000a) calibrating of model parameters with observations of model output parameters is described. However, du e to complexity issues this approach has limitations. In the present context, we will therefore concentrate on the situation, where we are able to specify an alternative model of the relation between the observations and the model parameters π(ω Y = y, Φ 0 ) = π(ω Φ 0 )π(φ 0 Y = y) though there is an inherent inconsistency in the approach as y Ω. Of course we may argue that y is independent of the features, we explore in the model, i.e., decisions and capacity restrictions. The problem handled in the present paper is how to specify the joint probability distribution of the parameters π O (Φ O ), in order to make it possible to draw pseudo-random instantiations Φ (i) 0 of the distribution. The presentation is structured as a description of three cases. The approach described is used in the Dina pig simulation model (Jørgensen & Kristensen, 1995), and may be combined with recent advances within Bayesian statistics 1.3 The framework for specification The specification of the prior distribution is similar to the specification need within Bayesian approaches to statistical analysis and learning in expert systems (Spiegelhalter et al., 1993, 1996). One widely used program is the so-called WinBUGS program (Spiegelhalter et al., 1999). The WinBUGS program is intended for inference in graphical models using the Markov Chain Monte Carlo approach. The original intention in the Dina pig model was to use the WinBUGS language for the specification. However, in most cases the use of WinBUGS would be too inconvenient. Under assumptions of independency between parameters, the graphical model is simply a set of disconnected nodes. Therefore, the model specification in the Dina pig model follows the specification language in WinBUGS, but is integrated into the general model specification. 3

4 2 Case I: Independent parameters When specifying a probability distribution for the parameters in the state of nature, a simplifying assumption is that the the parameters are independent. In the Dina pig model this is the standard assumption. The independence assumption implies that the joint density of the parameters is simply the product of the density of each individual parameter i.e., π(φ O ) = π(φ 0,1 ) π(φ 0,2 ) π(φ 0,n ) That is for each state parameter, Φ 0,i, in the model, instead of only specifying the expected value, we have to select a probability distribution and the parameters describing this distribution. We will follow standard practice and use the term hyper-parameters. Very often it is most natural to specify the distribution of the parameter on a different scale than the actual parameter. Parameters describing proportions may be specified on a logit scale and a log-normal distribution may be natural for some parameters. As an example, parameter values describing time until an event (i.e., positive) may often be described as following a lognormal distribution. Therefore, the normal distribution is selected as the distribution with corresponding hyperparameters and the transformation is the exponential function exp(). The available distributions and transformation closely follows the notation in the WinBUGS manual. 2.1 Specification of growth related parameters One of the available growth models in the Dina pig model, is an extension of a simple Gompertz growth model, as described in Jørgensen (1998). We will use this model as an example of the specification of the prior distribution. The Gompertz growth model in its standard format is dw dt = k {K ln(w t )} W t (4) where W t is the weight at time t, k is the growth rate and K is the logarithm of asymptotic maximum weight. This produces a sigmoid curve that closely corresponds to the growth of the pig. Notice, that the description of K should not be taken literally. Extrapolation from measurements during the slaughter pig growth phase to the age, when maximum weight is approached, is not reliable. The basic formula in Eq. (4) is modified in the simulation model, but the basic formula may still be recognised. A growth parameter called the current herd level at time t, k ht follows a first order stationary autoregressive process with k h(t+ t) = µ kh + α( t )(k ht µ kh ) + β( t )ε h (5) µ kh is the expected level (e.g. the population expectation) and ε h is a random noise, where ε h N (0, σ 2 h ) α( t) = exp( α 0 t ) and β( t ) = 1 α( t ) 2 is the autoregression parameters 4

5 with the varying length of the time steps taken into account. 1 α 0 corresponds approximately to the usual autoregression parameter with time step 1. The individual growth parameter for each pig is k pig. k pig is drawn from a normal distribution with expectation equal to the herd level at the time of the pig s introduction into the herd, i.e., k pig N (k ht, σk 2 ) with t the time of introduction into the herd of the pig. The specification the herd level of growth rate k h will be based on estimates of daily gain from production data bases. Usual values is that the herd level in daily gain varies between 700 and 1000 gram, roughly speaking a standard deviation of 300/4 75 g. As the daily gain is a function of the k parameter as well as the K parameter we use a first order Taylor approximation i.e., dg f(k 0, K 0 ) + f k (k 0, K 0 )(k k 0 ) + f K (k 0, K 0 )(K K 0 ) as basis for an approximate variance V(dg) (f k )2 (k 0, K 0 )V(k) + (f K )2 V(K) Furthermore, we assume that 90% of the variation is due to variation in k h. From these assumptions we find that K N (5.40, 1/ ), and k h N (0.0116, 1/ ) (The normal distribution is parameterised with mean and precision (= 1/σ 2 ) following WinBUGS). With the mean parameter values the average daily gain is 885 based on the growth of a single animal from 77 to 175 days. α 0 is selected to obtain a correlation between herd level 3 months apart of between 0.95 and 0.99, i.e., N (0.0003, 1/ ) The variation on herd level σ h is specified to reflect that the variance within the herd is assumed to be between 0.25 to 0.5 of the total variance between herds. A lognormal scale is assumed i.e., log(σh 2 ) N ( 6.6, ). Variation between pigs consists of a genetic part and a random walk part. The specification is based on the assumption, that after 90 days of growth the width of confidence interval for live weight is 30 kg, corresponding to a standard deviation of 30/4 Between 1/3 to 2/3 of the variance is permanent corresponding to a σ k between [ , ]). Therefore we select the following distribution for σ k N ( , ). The random walk part corresponds to an additional standard deviation in daily gain uniformly distributed between [0.25, 0.75]. Similar considerations is made for the specification of the other model parameters, i.e., parameters describing start weight, feed intake, feed waste, slaughter waste (killing out percentage), and relation ship between live weight and meat percentage. The available space does not allow us to present the data. Using the Dina pig model the kernel density plots shown in Fig. 1 is produced for each input parameter. k Std.k Std.dailygain Figure 1: Prior distribution of variables. The rug indicates the values used in actual simulation runs 5

6 3 Case 2: Using samples from Markov Chain Monte Carlo The second case is taken from a study concerning precision of clinical diagnosis, Bådsgaard & Jørgensen (2000). The results will be used in section 4 as well. The case has been selected because it illustrates a situation that is almost standard, when using simulation models for decision support. A hierarchic model is used for describing a population of subjects based on empirical data. When using the simulation model, we want to refer to a subject (e.g., a herd) from this population, either with no further information on the subject or with some additional information (e.g., previous performance) on the subject. In this case, the prior distribution is estimated using the Markov Chain Monte Carlo approach via WinBUGS. For clinical diseases, estimation of herd prevalence relies on how precise the veterinarian is. The precision is usually expressed as sensitivity, SE, the probability of correct identification of a diseased animal, and specificity, SP, the probability of correct identification of the healthy animal. SE and SP influences the observed prevalence in the herd. Consider the case, where a veterinarian inspects 10 animals. We want to estimate the probability of observing n obs diseased animals, conditional on the true prevalence in the herd p dis, i.e., Pr(n obs = i p dis = p) = Pr(n obs = i p dis = p, SE = u, SP = v)π(u, v)dudv (6) where π(u, v) is the joint probability density of sensitivity and specificity of the veterinarian. In this context, the simulation model is very simple, i.e., with known parameters the observed number of diseased animals simply follows the binomial distribution. However, the parameters is not known. Our knowledge concerning the specificity and sensitivity of the veterinarian may either arise from the specific knowledge of the vet based on previous observations, or from our general knowledge concerning the population of veterinarians. In Fig. 2 this is illustrated. In the study (Bådsgaard & Jørgensen, 2000) the distribution (Vet pop ) of the precision parameters (Vet i = {SE, SP}), were quantified using an experimental setup where 4 veterinarians (only two in the figure) simultaneously assessed clinical symptoms of a total of 155 animals. In the present context, we want to use the information from this study for estimation of π(u, v) in Eq. (6). Two situation will be addressed, either if a specific veterinarian participating in quantification study (Vet 2 ) or a different veterinarian selected at random from the population (Vet 3 ). 6

7 Quantification Study Vet pop "Simulation" model Vet 1 Vet 2 Vet 3 Symp i1 Symp i2 Symp i5 Symp i3 State i State i State i Herd 0 Herd 1 Herd 2 Herd pop Figure 2: Schematic illustration of the clinical setup and our study. The WinBUGS analysis produces a sample {(SE (1), SP (1) ), (SE (2), SP (2) ),..., (SE (n), SP (n) )} from the relevant distributions as illustrated by the kernel densities in Fig. 3 (n is the sample size). Note that we may be able to approximate the joint distribution using some standard probability distribution such as the multivariate normal distribution. However, it is not obvious to what extent the parameters will follow such a distribution. Furthermore, the efforts will be a waste of time. For our purpose, we need exactly what the MCMC approach produces, a sample from the correct joint distribution. In the present case a sample of were produced. To estimate the probability in Eq. (6) we proceed as follows for a given true prevalence p h. First the probability of observing disease symptoms is calculated p (i) o = SE (i) p h + (1 p h )(1 SP (i) ) Then number of animals with disease symptoms n (i) o i.e., n (i) o desired probabilities. Binomial(10, p (i) o ). Finally, the distribution of n (i) o is drawn from the binomial distribution for all i is used form finding the In the present case the "simulation" model is so simple that calculation time and sample size is of (almost) no concern. However, with more complicated simulation models this issue becomes important. In contrast to the previous case, the samples produced by the MCMC approach are not independent. Therefore, the precision of the output by the simulation model is not simply ˆσ/ n 7

8 Density a) Density b) logit(se) logit(sp) Figure 3: Kernel density estimates on logit scale of sensitivity (a) and specificity (b) for random veterinarian ( ) and veterinarian no. 1 ( ). Table 1: Distribution of number of diseased from clinical inspection with different herd health state. Random veterinarian. Herd Number of diseased (clinical) Health > Table 2: Distribution of number of diseased from clinical inspection with different herd health state. Vet. no. 1 from experiment Herd Number of diseased (clinical) Health >

9 3.1 Conclusion The Markov Chain Monte Carlo methods seems ideally suited to be used in the context of specification of prior distribution for use in simulation modelling. Even if standard statistical model such as generalized linear models may be more expedient for experimental analysis, MCMC may be still relevant because a random sample from the population is automatically produces. The only problem is that the samples are not drawn independently, but to a large extent this can be remedied by thinning the sample. 4 Case 3: Sampling from Expert system (Bayesian network) The third case is taken from a project concerning intervention strategies for respiratory diseases, as presented in Otto (2000). The system uses the uses HUGIN TM program to formulate a probabilistic expert system for diagnosis and error detection concerning Mycoplasma. The present example is a slight modification of an example described in detail in Jørgensen (2000b). The final system will in addition to the diagnostic network include a module for Monte Carlo assessment of cost-benefit of different controlstrategies. The prevalence of the disease is expected to depend on management level and two risk factors. The prevalence may be observed either by the farmer or by a veterinarian. The precision of the farmers observation depends on his ability as a manager. Disease prevalence and quality of management influences growth rate. Manage Risk 1 Risk 2 Gain Preval Farm obs VetFind Figure 4: Hugin Expert system The quantifications of the dependencies in Fig. 4 is based upon Stärk et al. (1998). Two of the risk factors in her table 10 has been selected with corresponding parameter estimates. Two additions has been made. The overdispersion has been modelled by a random herd effect, and an additional management factor not included in her study is added for illustration. The 9

10 parameters from the logistic regression in Stärk et al. (1998) has thus been supplement with effect of management and between herd variation. Three factors influence herd prevalence. The management quality (Manage), Manure removal in nursery (Risk1) and No. of pigs in room (nursery) (Risk2) is No. of pigs in room (nursery). Each factor is categorized into discrete levels. The detailed model parameters are described in Jørgensen (2000b). The parameters is used to specify the necessary probability distribution tables in the Bayesian network in Fig. 4. The Prevalence node is defined as a continuous variable, the prevalence of serologically positive animals. For the purpose of the model the prevalence node is divided into 5 categories No disease, from 1-10 percent disease, from 10 to 40 percent disease, from 40 to 60 percent disease and above 60 percent disease. Based on the model we can calculate the probability of being in each of these different categories of disease level for each combination of the parent nodes (risk factors). To illustrate, the probability distribution is shown for a selected part of the combinations of risk factors In Table 3. Table 3: Distribution of herd health level for average management and selected risk factors. No. Of pigs 1st quartile 2nd quartile Manure Removal < daily daily > daily < daily daily > daily Herd Health > The next step in the modelling is the specification of the problem detection by the farmer. In the present example Farm_obs is defined with two levels No problem observed and Problem observed. In his daily work, the farmer assess the disease level continuously, but the measurement is not necessarily very precise. Furthermore, the observations may not lead to a problem detection, because the farmer may suppose that he is looking at a normal disease level, i.e., is threshold for problem detection is high. A natural model of the farmers observation is that good farmers are more precise in their observation, and that they tend to react to lower levels of disease. Based on these assumptions the probability table of Farm_obs conditioned on management quality and health problem is specified. The next node is the veterinarian diagnosis, i.e., he visits the farm and samples 10 animals at random and makes a clinical inspection of the animal. The outcome of the clinical inspection is the number of diseased animals, i.e., the states of the Vet_Find1 node is {0,..., 10}. If we know the herd prevalence and the precision of the veterinarian, we can calculate the probability distribution of number of diseased based on the assumptions above. This is exactly the probability table that were specified in section 3 and Table 1 and 2. Of course, the table need to reflect our knowledge concerning the veterinarian. 10

11 In the typical use of the expert system, we need to base our inference on evidence on a minimum of two nodes. The farmer will have detected a problem in the herd, and we will have the the result of the veterinarians inspection of the sample. Conditional on this evidence we need to advice the farmer, if he should change his production strategy and how it should be changed. Our expectation towards future production strategies will of course depend on the risk factors actually causing the problem. A high stocking rate might suggest an increase of herd size combined with sectioned production. But if there is poor management as well the full benefit of sectioned production might not be obtained. The cost-benefit of control-strategies thus depends on the combination of risk-factors presents. The Bayesian network contains the full joint probability of these combinations, and in the program Hugin a random sample from this joint distribution may readily be found, using existing procedures in the application programming interface (simulate). In contrast to the output from the MCMC method, the subsequent samples from Hugin are independent samples from the distribution. 5 Conclusion In the present paper different approaches towards specification of prior distribution of "stateof-nature" parameters has been presented. The conclusions is that such a specification can be made readily using off-the-shelf methods, and the possibility for handling prior evidence concerning these parameters are good. The only word of caution, is that the sample produced by the MCMC does not consist of independent instantiations from the distribution, but this can be easily remedied by simply discarding instantiations. However, it should be noted that the techniques are restricted to relationships between model parameters and evidence, where it is not important to use the full simulation model. This relates especially to capacity restrictions and interactions between animals. The ideas in Jørgensen (2000a) may be used in such cases. Another important aspect not covered, is the state of individual animals currently present in the herd. It is not possible to estimate the current state of an individual in the herd without taken observations and decisions into account. The individuals remain in the herd because it has been decided not to cull it. We need techniques to calculate probability distribution of the current state given the evidence that it is alive. If the use of simulation models is restricted to steady state results of production strategies, we can avoid this problem. 11

12 References Bådsgaard, N.P. & E. Jørgensen (2000). A Bayesian approach to estimating the reliability of clinical observations With an application to herd prevalence estimation. Preventive Veterinary Medicine, in prep. Fishmann, G.S. (1996). Monte Carlo. Concepts, Algorithms, and Applications. Springer-Verlag New York, Inc. Jørgensen, E. (1998). Stochastic modelling of pig production. Working Paper: Growth Models. Dina Notat, 73 pp URL: eps. Jørgensen, E. (2000a). Calibration of a Monte Carlo Simulation Model of Disease Spread in Slaughter Pig Units. Computers and Electronics in Agriculture, 25 pp URL: Jørgensen, E. (2000b). Elements of Bayesian network specification in an animal health research project. Internal report, Biometry Research Unit, Danish Institute of Agricultural Sciences, pp URL: diag1504a.pdf. Jørgensen, E. & A.R. Kristensen (1995). An object oriented simulation model of a pig herd with emphasis on information flow. In FACTs 95 March 7, 8, 9, 1995, Orlando Florida, Farm Animal Computer Technologies Conference, pp Otto, L. (2000). Mycoplasma for pigs in a Bayesian Network: A decision support system. In Proc. "Economic modelling of Animal Health and Farm Management". November 23-24, 2000 Wageningen. Spiegelhalter, D.J., A.P. Dawid, S.L. Lauritzen, & R.G. Cowell (1993). Bayesian Analysis in Expert Systems. Statistical Science, 8(3) pp Spiegelhalter, D.J., A. Thomas, & N. Best (1996). Computation on Bayesian Graphical Models. Bayesian Statistics, 5 pp Spiegelhalter, D.J., A. Thomas, N. Best, & W. Gilks (1999). WinBUGS. Version 1.2 User Manual. MRC Biostatistics Unit. URL: uk/bugs/welcome.html. Stärk, K.D.C., D.U. Pfeiffer, & R.S. Morris (1998). Risk factors for respiratory disease in New Zealand pig herds. New Zealand Veterinary Journal, pp

Herd Management Science

Herd Management Science Herd Management Science Preliminary edition Compiled for the Advanced Herd Management course KVL, August 28th - November 3rd, 2006 Anders Ringgaard Kristensen Erik Jørgensen Nils Toft The Royal Veterinary

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

A Bayesian hierarchical surrogate outcome model for multiple sclerosis

A Bayesian hierarchical surrogate outcome model for multiple sclerosis A Bayesian hierarchical surrogate outcome model for multiple sclerosis 3 rd Annual ASA New Jersey Chapter / Bayer Statistics Workshop David Ohlssen (Novartis), Luca Pozzi and Heinz Schmidli (Novartis)

More information

Model-based Synthesis. Tony O Hagan

Model-based Synthesis. Tony O Hagan Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

More information

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Applications of R Software in Bayesian Data Analysis

Applications of R Software in Bayesian Data Analysis Article International Journal of Information Science and System, 2012, 1(1): 7-23 International Journal of Information Science and System Journal homepage: www.modernscientificpress.com/journals/ijinfosci.aspx

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models

Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models Clement A Stone Abstract Interest in estimating item response theory (IRT) models using Bayesian methods has grown tremendously

More information

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February

More information

How To Understand The Theory Of Probability

How To Understand The Theory Of Probability Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information

Parallelization Strategies for Multicore Data Analysis

Parallelization Strategies for Multicore Data Analysis Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem Time on my hands: Coin tosses. Problem Formulation: Suppose that I have

More information

Monte Carlo-based statistical methods (MASM11/FMS091)

Monte Carlo-based statistical methods (MASM11/FMS091) Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlo-based

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Forecast covariances in the linear multiregression dynamic model.

Forecast covariances in the linear multiregression dynamic model. Forecast covariances in the linear multiregression dynamic model. Catriona M Queen, Ben J Wright and Casper J Albers The Open University, Milton Keynes, MK7 6AA, UK February 28, 2007 Abstract The linear

More information

Statistical Rules of Thumb

Statistical Rules of Thumb Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN

More information

Monte Carlo Methods in Finance

Monte Carlo Methods in Finance Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook October 2, 2012 Outline Introduction 1 Introduction

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE

PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE BY P.D. ENGLAND AND R.J. VERRALL ABSTRACT This paper extends the methods introduced in England & Verrall (00), and shows how predictive

More information

11. Time series and dynamic linear models

11. Time series and dynamic linear models 11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

More information

Maximum likelihood estimation of mean reverting processes

Maximum likelihood estimation of mean reverting processes Maximum likelihood estimation of mean reverting processes José Carlos García Franco Onward, Inc. jcpollo@onwardinc.com Abstract Mean reverting processes are frequently used models in real options. For

More information

Quantitative Inventory Uncertainty

Quantitative Inventory Uncertainty Quantitative Inventory Uncertainty It is a requirement in the Product Standard and a recommendation in the Value Chain (Scope 3) Standard that companies perform and report qualitative uncertainty. This

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Equity-Based Insurance Guarantees Conference November 1-2, 2010. New York, NY. Operational Risks

Equity-Based Insurance Guarantees Conference November 1-2, 2010. New York, NY. Operational Risks Equity-Based Insurance Guarantees Conference November -, 00 New York, NY Operational Risks Peter Phillips Operational Risk Associated with Running a VA Hedging Program Annuity Solutions Group Aon Benfield

More information

Time series analysis as a framework for the characterization of waterborne disease outbreaks

Time series analysis as a framework for the characterization of waterborne disease outbreaks Interdisciplinary Perspectives on Drinking Water Risk Assessment and Management (Proceedings of the Santiago (Chile) Symposium, September 1998). IAHS Publ. no. 260, 2000. 127 Time series analysis as a

More information

Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

More information

DECISION MAKING UNDER UNCERTAINTY:

DECISION MAKING UNDER UNCERTAINTY: DECISION MAKING UNDER UNCERTAINTY: Models and Choices Charles A. Holloway Stanford University TECHNISCHE HOCHSCHULE DARMSTADT Fachbereich 1 Gesamtbibliothek Betrtebswirtscrtaftslehre tnventar-nr. :...2>2&,...S'.?S7.

More information

CHAPTER 3 CALL CENTER QUEUING MODEL WITH LOGNORMAL SERVICE TIME DISTRIBUTION

CHAPTER 3 CALL CENTER QUEUING MODEL WITH LOGNORMAL SERVICE TIME DISTRIBUTION 31 CHAPTER 3 CALL CENTER QUEUING MODEL WITH LOGNORMAL SERVICE TIME DISTRIBUTION 3.1 INTRODUCTION In this chapter, construction of queuing model with non-exponential service time distribution, performance

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2

Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2 Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2 Due Date: Friday, March 11 at 5:00 PM This homework has 170 points plus 20 bonus points available but, as always, homeworks are graded

More information

More details on the inputs, functionality, and output can be found below.

More details on the inputs, functionality, and output can be found below. Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing

More information

An analysis of price impact function in order-driven markets

An analysis of price impact function in order-driven markets Available online at www.sciencedirect.com Physica A 324 (2003) 146 151 www.elsevier.com/locate/physa An analysis of price impact function in order-driven markets G. Iori a;, M.G. Daniels b, J.D. Farmer

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases Andreas Züfle Geo Spatial Data Huge flood of geo spatial data Modern technology New user mentality Great research potential

More information

One-year reserve risk including a tail factor : closed formula and bootstrap approaches

One-year reserve risk including a tail factor : closed formula and bootstrap approaches One-year reserve risk including a tail factor : closed formula and bootstrap approaches Alexandre Boumezoued R&D Consultant Milliman Paris alexandre.boumezoued@milliman.com Yoboua Angoua Non-Life Consultant

More information

Problem of Missing Data

Problem of Missing Data VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

More information

Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities

Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities Oscar Kipersztok Mathematics and Computing Technology Phantom Works, The Boeing Company P.O.Box 3707, MC: 7L-44 Seattle, WA 98124

More information

The Intelligent Pig Barn. Anders Ringgaard Kristensen University of Copenhagen

The Intelligent Pig Barn. Anders Ringgaard Kristensen University of Copenhagen The Intelligent Pig Barn Anders Ringgaard Kristensen University of Copenhagen Who am I? Anders Ringgaard Kristensen: Born 1958 Grew up on a farm in Western Jutland Degrees: 1982: Animal Scientist 1985:

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

A spreadsheet Approach to Business Quantitative Methods

A spreadsheet Approach to Business Quantitative Methods A spreadsheet Approach to Business Quantitative Methods by John Flaherty Ric Lombardo Paul Morgan Basil desilva David Wilson with contributions by: William McCluskey Richard Borst Lloyd Williams Hugh Williams

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

Bayesian Statistical Analysis in Medical Research

Bayesian Statistical Analysis in Medical Research Bayesian Statistical Analysis in Medical Research David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz draper@ams.ucsc.edu www.ams.ucsc.edu/ draper ROLE Steering

More information

Confidence Intervals for One Standard Deviation Using Standard Deviation

Confidence Intervals for One Standard Deviation Using Standard Deviation Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab Monte Carlo Simulation: IEOR E4703 Fall 2004 c 2004 by Martin Haugh Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab 1 Overview of Monte Carlo Simulation 1.1 Why use simulation?

More information

Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Uncertainty of Power Production Predictions of Stationary Wind Farm Models

Uncertainty of Power Production Predictions of Stationary Wind Farm Models Uncertainty of Power Production Predictions of Stationary Wind Farm Models Juan P. Murcia, PhD. Student, Department of Wind Energy, Technical University of Denmark Pierre E. Réthoré, Senior Researcher,

More information

Imputing Values to Missing Data

Imputing Values to Missing Data Imputing Values to Missing Data In federated data, between 30%-70% of the data points will have at least one missing attribute - data wastage if we ignore all records with a missing value Remaining data

More information

Confidence Intervals for Spearman s Rank Correlation

Confidence Intervals for Spearman s Rank Correlation Chapter 808 Confidence Intervals for Spearman s Rank Correlation Introduction This routine calculates the sample size needed to obtain a specified width of Spearman s rank correlation coefficient confidence

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

Better decision making under uncertain conditions using Monte Carlo Simulation

Better decision making under uncertain conditions using Monte Carlo Simulation IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics

More information

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x, Computing: an indispensable tool or an insurmountable hurdle? Iain Currie Heriot Watt University, Scotland ATRC, University College Dublin July 2006 Plan of talk General remarks The professional syllabus

More information

TEACHING SIMULATION WITH SPREADSHEETS

TEACHING SIMULATION WITH SPREADSHEETS TEACHING SIMULATION WITH SPREADSHEETS Jelena Pecherska and Yuri Merkuryev Deptartment of Modelling and Simulation Riga Technical University 1, Kalku Street, LV-1658 Riga, Latvia E-mail: merkur@itl.rtu.lv,

More information

STOCHASTIC MODELLING OF WATER DEMAND USING A SHORT-TERM PATTERN-BASED FORECASTING APPROACH

STOCHASTIC MODELLING OF WATER DEMAND USING A SHORT-TERM PATTERN-BASED FORECASTING APPROACH STOCHASTIC MODELLING OF WATER DEMAND USING A SHORT-TERM PATTERN-BASED FORECASTING APPROACH Ir. LAM Shing Tim Development(2) Division, Development Branch, Water Supplies Department. Abstract: Water demand

More information

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Numerical Methods for Option Pricing

Numerical Methods for Option Pricing Chapter 9 Numerical Methods for Option Pricing Equation (8.26) provides a way to evaluate option prices. For some simple options, such as the European call and put options, one can integrate (8.26) directly

More information

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Introduction to Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem

More information

Confidence Intervals for Cp

Confidence Intervals for Cp Chapter 296 Confidence Intervals for Cp Introduction This routine calculates the sample size needed to obtain a specified width of a Cp confidence interval at a stated confidence level. Cp is a process

More information

The problem with waiting time

The problem with waiting time The problem with waiting time Why the only way to real optimization of any process requires discrete event simulation Bill Nordgren, MS CIM, FlexSim Software Products Over the years there have been many

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Syllabus for the TEMPUS SEE PhD Course (Podgorica, April 4 29, 2011) Franz Kappel 1 Institute for Mathematics and Scientific Computing University of Graz Žaneta Popeska 2 Faculty

More information

Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour. Patrick Lam Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

More information

Optimal Stopping in Software Testing

Optimal Stopping in Software Testing Optimal Stopping in Software Testing Nilgun Morali, 1 Refik Soyer 2 1 Department of Statistics, Dokuz Eylal Universitesi, Turkey 2 Department of Management Science, The George Washington University, 2115

More information

SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

More information

An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment

An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment Hideki Asoh 1, Masanori Shiro 1 Shotaro Akaho 1, Toshihiro Kamishima 1, Koiti Hasida 1, Eiji Aramaki 2, and Takahide

More information

Model Calibration with Open Source Software: R and Friends. Dr. Heiko Frings Mathematical Risk Consulting

Model Calibration with Open Source Software: R and Friends. Dr. Heiko Frings Mathematical Risk Consulting Model with Open Source Software: and Friends Dr. Heiko Frings Mathematical isk Consulting Bern, 01.09.2011 Agenda in a Friends Model with & Friends o o o Overview First instance: An Extreme Value Example

More information

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

More information

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics Slide 1 An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics Dr. Christian Asseburg Centre for Health Economics Part 1 Slide 2 Talk overview Foundations of Bayesian statistics

More information

Analysis of Financial Time Series

Analysis of Financial Time Series Analysis of Financial Time Series Analysis of Financial Time Series Financial Econometrics RUEY S. TSAY University of Chicago A Wiley-Interscience Publication JOHN WILEY & SONS, INC. This book is printed

More information

Chapter 14 Managing Operational Risks with Bayesian Networks

Chapter 14 Managing Operational Risks with Bayesian Networks Chapter 14 Managing Operational Risks with Bayesian Networks Carol Alexander This chapter introduces Bayesian belief and decision networks as quantitative management tools for operational risks. Bayesian

More information

Prentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009

Prentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009 Content Area: Mathematics Grade Level Expectations: High School Standard: Number Sense, Properties, and Operations Understand the structure and properties of our number system. At their most basic level

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Machine Learning Logistic Regression

Machine Learning Logistic Regression Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

More information

Practical Applications of Stochastic Modeling for Disability Insurance

Practical Applications of Stochastic Modeling for Disability Insurance Practical Applications of Stochastic Modeling for Disability Insurance Society of Actuaries Session 8, Spring Health Meeting Seattle, WA, June 007 Practical Applications of Stochastic Modeling for Disability

More information

A Bayesian Antidote Against Strategy Sprawl

A Bayesian Antidote Against Strategy Sprawl A Bayesian Antidote Against Strategy Sprawl Benjamin Scheibehenne (benjamin.scheibehenne@unibas.ch) University of Basel, Missionsstrasse 62a 4055 Basel, Switzerland & Jörg Rieskamp (joerg.rieskamp@unibas.ch)

More information

Pr(X = x) = f(x) = λe λx

Pr(X = x) = f(x) = λe λx Old Business - variance/std. dev. of binomial distribution - mid-term (day, policies) - class strategies (problems, etc.) - exponential distributions New Business - Central Limit Theorem, standard error

More information

Inequality, Mobility and Income Distribution Comparisons

Inequality, Mobility and Income Distribution Comparisons Fiscal Studies (1997) vol. 18, no. 3, pp. 93 30 Inequality, Mobility and Income Distribution Comparisons JOHN CREEDY * Abstract his paper examines the relationship between the cross-sectional and lifetime

More information

The Performance of Option Trading Software Agents: Initial Results

The Performance of Option Trading Software Agents: Initial Results The Performance of Option Trading Software Agents: Initial Results Omar Baqueiro, Wiebe van der Hoek, and Peter McBurney Department of Computer Science, University of Liverpool, Liverpool, UK {omar, wiebe,

More information