Estimating the evidence for statistical models
|
|
- Abigail Holland
- 7 years ago
- Views:
Transcription
1 Estimating the evidence for statistical models Nial Friel University College Dublin March, 2011
2 Introduction Bayesian model choice Given data y and competing models: m 1,..., m l, each with parameters θ 1,..., θ l, respectively. Bayesian inference: π(θ k, m k y) π(y θ k, m k )π(θ k m k )p(m k )
3 Introduction Model evidence Within model m k : π(θ k y, m k ) π(y θ k, m k )π(θ k m k ) Constant of proportionality is π(y m k ) = π(y θ k, m k )π(θ k m k )dθ k. θ k This is often called the marginal likelihood, integrated likelihood or evidence and is difficult to compute in general.
4 Introduction Posterior model probabilities Suppose we could compute π(y m k ). Then, using Bayes theorem we get, π(m k y) = π(y m k)π(m k ) l 1 π(y m k)π(m k ).
5 Introduction Bayes factors If we have two competing models: π(m 1 y) π(m 2 y) = π(y m 1) π(y m 2 ) π(m 1) π(m 2 ) posterior odds = Bayes factor prior odds The Bayes factor, B 12 = π(y m 1) π(y m 2 ). The larger B 12 is, the greater the evidence in favour of M 1 compared to M 2.
6 Introduction Bayesian model averaging Predictions can be made by averaging over all models, weighted proportional to the posterior model probability, thereby incorporating model uncertainty. π(y y) = l π(y m k, y)π(m k y) k=1 This is the average of the posterior distribution for y under each model weighted by the corresponding posterior model probabilities.
7 Introduction Why estimating the model evidence is a challenge π(y m k ) is an integral of a (usually) highly variable function over a high-dimensional parameter space. Analytic tractability is sometimes possible, often where conjugate priors are used. This is quite rare. Consequently, sophisticated Monte Carlo methods are needed.
8 Introduction Within model search or across model search? Within model search: Inference for π(θ k y) separately for every m k. This is used to estimate π(y m k ), for all k. There are many approaches under this heading. Across model search: Here inference is carried out over the joint model and parameter space, π(θ k, m k y). In an MCMC setting, only one chain is needed! Reversible jump Markov chain Monte Carlo developed by Green (1995) is the dominant approach. (> 1, 400 citations to date...)
9 Review of evidence estimation Laplace s method Laplace s method (eg Tierney and Kadane 1986) Assume that π(θ k y) is highly peaked around the posterior mode θ k eg if sample size is large enough. Define l(θ k ) = log{π(y θ k )π(θ k )} Expand l(θ k ) as a quadratic about θ k and then exponentiate. Result gives an approximation to π(y θ k )π(θ k ) as a Gaussian with mean θ k and covariance Σ = ( D 2 l( θ k )) 1, where D 2 l( θ k ) is the Hessian matrix of second derivatives. Integrating this approximation yields π(y) (2π) d/2 Σ 1/2 π(y θ k )π( θ k ).
10 Review of evidence estimation Harmonic mean estimator Harmonic mean estimator (Newtown and Raftery (1994)) π(y) = 1/ Why does this hold? ( 1 n ) n π(y θ i ), θ i π(θ y). 1 { } 1 E π(θ y) π(θ y) = π(y θ)π(θ) π(θ y)π(y) dθ = 1 π(y) π(θ) dθ = 1 π(y). The bad news?
11 Review of evidence estimation Harmonic mean estimator Harmonic mean estimator (Newtown and Raftery (1994)) π(y) = 1/ Why does this hold? ( 1 n ) n π(y θ i ), θ i π(θ y). 1 { } 1 E π(θ y) π(θ y) = π(y θ)π(θ) π(θ y)π(y) dθ = 1 π(y) π(θ) dθ = 1 π(y). The bad news?
12 Review of evidence estimation Harmonic mean estimator Harmonic mean estimator (Newtown and Raftery (1994)) π(y) = 1/ Why does this hold? ( 1 n ) n π(y θ i ), θ i π(θ y). 1 { } 1 E π(θ y) π(θ y) = π(y θ)π(θ) π(θ y)π(y) dθ = 1 π(y) π(θ) dθ = 1 π(y). The bad news?
13 Review of evidence estimation Harmonic mean estimator Harmonic mean estimator (Newtown and Raftery (1994)) π(y) = 1/ ( 1 n ) n π(y θ i ), θ i π(θ y). 1 This estimator is based solely on draws from the posterior. But the posterior is typically much more peaked than the prior, eg, when the posterior is insensitive to the prior. Hence in such situations, the harmonic mean estimator will not change much as the prior changes. But π(y) is very sensitive to changes in the prior. This drawback is very well documented. See Radford Neal s blog, for example.
14 Review of evidence estimation Chib s method Chib s method (Chib 1995) Chib (1995) presented a generic method which can be applied to output from the Gibbs sampler. Re-writing this, So we could estimate log π(y) as π(θ y) = π(y θ)π(θ) π(y) π(y) = π(y θ)π(θ). π(θ y) log π(y) = log π(y θ ) + log π(θ ) log ˆπ(θ y) where ˆπ(θ y) is an estimate of the posterior density at a point θ of high posterior prob.
15 Review of evidence estimation Chib s method Chib s method (Chib 1995) Chib s method relies on estimating π(θ y). Suppose the vector θ can be partitioned as (θ 1, θ 2, θ 3 ), where the full-conditional distribution of each θ i is standard. π(θ y) = π(θ 1 θ 2, θ 3, y)π(θ 2 θ 3, y)π(θ 3 y) Gibbs sampling can be used to estimate each factor on the LHS: π(θ 2 θ 3, y) = 1 N π(θ 3 y) = 1 N j j π(θ 2 θ (j) 1, θ 3). π(θ 3 θ (j) 1, θ(j) 2 ).
16 Review of evidence estimation Chib s method Chib s method (Chib 1995) In general, Chib s method can be applied when θ is partitioned into an arbitrary number of blocks. The only requirement is that the full-conditional sampling of each block is possible.
17 Review of evidence estimation Annealed importance sampling Annealed Importance Sampling (Neal 2001) AIS is a very clever algorithm which shows how tempering can be used to define an importance samping function to sample from complex distributions. Aside: Importance sampling to sample from a target f (x) using an importance function g(x): x (1),..., x (N) g(x) Further, E f a(x) = w (i) a(x (i) ), where w (i) = f (x (i) ) w (i) g(x (i) ) 1 w (i) z f as N, N z g where z f = x f (x)dx and z g = x g(x)dx.
18 Review of evidence estimation Annealed importance sampling Annealed Importance Sampling (Neal 2001) Define π i (θ y) = π(θ) 1 t i π(θ y) t i, where 1 = t 0 > > t n = 0. Thus π t0 and π tn corresponds to posterior and prior, respectively. Let T i denote a Markov transition kernel with invariant π ti. For j = 1,..., N Sample θ n 1 from π tn Sample θ n 2 from θ n 1 using T n 1 Sample θ 0 from θ 1 using T 1. Set θ (j) = θ 0 and w (j) = π n 1(θ n 1 ) π n 2 (θ n 2 ) π n (θ n 1 ) π n 1 (θ n 2 )... π 0(θ 0 ) π 1 (θ 0 ).
19 Review of evidence estimation Annealed importance sampling Annealed Importance Sampling AIS yields: 1. An independent sample {θ (i) } from π(θ y). 2. An estimator of the evidence π(y) 1 n n w (i). i=1
20 Review of evidence estimation Power posteriors Evidence estimation via power posteriors (NF and Pettitt (2008)) Consider the Power posterior: π(θ y, t) {π(y θ)} T (t) p(θ) where T : [0, 1] [0, 1] is defined st T (0) = 0 and T (1) = 1. Its normalising constant is z(y t) = {π(y θ)} t p(θ) dθ. z(y t = 1): Posterior model evidence. z(y t = 0): Integral of the prior for θ, which equals 1. θ
21 Review of evidence estimation Power posteriors Evidence via power posteriors The evidence follows the identity: { } z(y t = 1) log π(y) = log = z(y t = 0) Proof: 1 d dt log(z(y t)) = 1 z(y t) z (y t) = = 1 z(y t) 0 E θ t log π(y θ)dt. d dt log(π(y θ))t π(θ)dθ log(π(y θ)) π(y θ)t π(θ)dθ z(y t) = E θ t log(π(y θ)).
22 Review of evidence estimation Power posteriors Evidence via power posteriors d dt log z(y t) = E θ t log(π(y θ)) This is the mean deviance wrt to (θ y, t) - the power posterior. Integrating wrt t yields, { } z(y t = 1) 1 log π(y) = log = E z(y t = 0) θ t log π(y θ)dt. This is essentially an application of thermodynamic integration, which was first developed in the statistical physics community, and outlined in Gelman and Meng (1998). 0
23 Review of evidence estimation Power posteriors In practice: Discretise t [0, 1], 0 = t 0 < t 1,..., t n = 1. For each t i : Sample θ π(θ y, t) and estimate E i = E θ ti log π(y θ). π(y) = n ( ) (Ei 1 + E i ) (t i t i 1 ) 2 i=1
24 Review of evidence estimation Power posteriors Sensitivity of p(y) to the prior - toy example How does sensitivity to the prior impact on this method? Suppose y = {y i } iid N(θ, 1). A priori, θ N(m, v). Then the power posterior θ y, t N(m t, v t ), where m t = ntȳ + m/v nt + 1/v and v t = 1 nt + 1/v and E θ y,t log π(y θ) = log 2π n (y i ȳ) 2 n 2 i=1 (m ȳ) 2 (vmt + 1) 2 n 2 When t = 0 final term is nv/2. As v so too does E θ y,t. 1 (nt + 1/v).
25 Review of evidence estimation Power posteriors Expected deviance, under the distribution θ y, t plotted against t for prior variance equal to 10, 5, 1. As v increases, so too does the rate at which the mean deviance changes with t
26 Review of evidence estimation Power posteriors Connection to Fractional Bayes estimator The fraction z(y t = 1)/z(y t = a) where a is close to 0, is precisely the estimate of the marginal likelihood used in the Fractional Bayes estimate of the Bayes factor (O Hagan 95). π(y) z(y t = 1) z(y t = a) = = θ π(y θ)π(θ) dθ θ {π(y θ)}a π(θ) dθ 1 a E θ t log π(y θ)dt This method was proposed to compute Bayes factor with un-informative priors. Impropriety in π(θ) cancels above and below. Essentially a fraction a of the data is borrowed for the prior.
27 Review of evidence estimation Power posteriors Power posterior approach It is realitively straightforward to code/implement. It is a generic method. In some cases it can be implemented in WinBUGS. Choosing the temperature schedule is vital this is the weakness of this approach. Behrens, NF, Hurn (2011) offer some possibility in this direction.
28 Review of evidence estimation Nested sampling Nested sampling (Skilling, 2006) (For the moment (for ease of notation), let L(θ) = π(y θ).) π(y) = L(θ)π(θ) dθ = L(θ) dx, where dx = π(θ) dθ is an element of prior mass. Define X (λ) = π(θ) dθ L(θ)>λ as a cumulant prior mass. Write the inverse function as L(X ), ie L(X (λ)) = λ. This then allows us to express the evidence as a 1 dimensional integral: π(y) = 1 0 L(X ) dx.
29 Review of evidence estimation Nested sampling Nested sampling (Skilling, 2006) (For the moment (for ease of notation), let L(θ) = π(y θ).) π(y) = L(θ)π(θ) dθ = L(θ) dx, where dx = π(θ) dθ is an element of prior mass. Define X (λ) = π(θ) dθ L(θ)>λ as a cumulant prior mass. Write the inverse function as L(X ), ie L(X (λ)) = λ. This then allows us to express the evidence as a 1 dimensional integral: π(y) = 1 0 L(X ) dx.
30 Review of evidence estimation Nested sampling Nested sampling (Skilling, 2006) (For the moment (for ease of notation), let L(θ) = π(y θ).) π(y) = L(θ)π(θ) dθ = L(θ) dx, where dx = π(θ) dθ is an element of prior mass. Define X (λ) = π(θ) dθ L(θ)>λ as a cumulant prior mass. Write the inverse function as L(X ), ie L(X (λ)) = λ. This then allows us to express the evidence as a 1 dimensional integral: π(y) = 1 0 L(X ) dx.
31 Review of evidence estimation Nested sampling Nested sampling The main computational burden is the requirement to sample θ from the prior subject to the constraint that L(θ) > l. This is roughly similar to the computational effort of slice sampling (Neal, 2003). The evidence is estimated by sorting draws from the prior according to their likelihood. I 1 π(y) = Z = (X i X i+1 )L i. i=1
32 Review of evidence estimation Nested sampling Sketch of algorithm Sample θ 1,..., θ N from the prior. Repeat for i = 1,..., I : Find the point θ k with the smallest likelihood, l i, among the N current θ i s. Set X i = exp(i/n) and w i = X i 1 X i. Increment Z by L i w i. Replace θ k with a point sampled from the prior subject to L(θ) > l i.
33 Evidence estimation: doubly intractable distributions Doubly intractable distributions π(θ y) π(y θ)π(θ) Here we assume that the likelihood, π(y θ), is impossible to evaluate.
34 Evidence estimation: doubly intractable distributions Ising model Doubly intractable distributions Gibbs random fields, which find use in spatial statistics and statistical network analysis, involves intractable likelihood models. Ising model Defined on a lattice y = {y 1,..., y n }. Lattice points y i take values { 1, 1}. Full conditional π(y i y i, θ) = π(y i neighbours of i, θ). 1 π(y θ) q(y θ) = exp 2 θ 1 y i y j. Here means is a neighbour of. i j
35 Evidence estimation: doubly intractable distributions Ising model 1st order and 2nd order Ising models. π(y θ) = exp(θt s(y)) z(θ) s(y) is a sufficient statistics and counts the number of like neighbours. z(θ) = x 1 q(y θ). x n
36 Evidence estimation: doubly intractable distributions Ising model Model evidence for MRFs our approach π(y) = q(y θ)π(θ) z(θ)π(θ y) θ. Draw from the posterior, and estimate π(θ y) for a high probability θ. Estimate z(θ) using thermodynamic integration.
37 Evidence estimation: doubly intractable distributions Simulating from the posterior Auxiliary variable method (Møller et al., 2006) Introduce an auxiliary variable y on the same space as the data y and extend the target distribution π(θ, y y) π(y θ)π(θ)π(y θ 0 ), for some fixed θ 0. Joint update (θ, y ) with proposal: h(θ, y θ, y ) = h 1 (y θ )h 2 (θ θ, y ) where h 1 (y θ ) = π(y θ ) = q(y θ ) z(θ. )
38 Evidence estimation: doubly intractable distributions Simulating from the posterior α(θ, y θ, y ) = π(y θ )π(θ )π(y θ 0 )π(y θ)h 2 (θ θ ) π(y θ)π(θ)π(y θ 0 )π(y θ )h 2 (θ θ) z(θ ) appears in π(y θ ) above and in π(y θ ) below, and therefore cancels. Similarly z(θ) cancels above and below. The choice of θ 0 is important. eg the maximum pseudolikelihood estimate based on y.
39 Evidence estimation: doubly intractable distributions Exchange algorithm Exchange algorithm (Murray, Ghahramani & MacKay 2006) Sample from an augmented distribution π(θ, y, θ y) π(y θ)π(θ)h(θ θ)π(y θ ) whose marginal distribution for θ is the posterior of interest π(y θ ) is the same likelihood model on which y is defined. h(θ θ) arbitrary distribution for the augmented variable θ which might depend on θ (eg random walk distribution centred at θ)
40 Evidence estimation: doubly intractable distributions Exchange algorithm Exchange algorithm How it works 1 Gibbs update of (θ, y ) i Draw θ h( θ) ii Draw y π( θ ) 2 Exchange move from (θ, y), (θ, y ) to (θ, y), (θ, y ) with probability α = min 1, q(y θ) q(y θ) } {{ } π(θ ) h(θ θ ) π(θ) h(θ θ) q(y θ ) q(y θ ) } {{ } z(θ)z(θ ) z(θ)z(θ ) } {{ } 1 Exchange move proposes to offer the data y the auxiliary θ and similarly to offer the auxiliary data y the parameter θ The affinity between θ and y is measured by (**) and the affinity between θ and y by (*)
41 Evidence estimation: doubly intractable distributions Exchange algorithm Exchange algorithm How it works 1 Gibbs update of (θ, y ) i Draw θ h( θ) ii Draw y π( θ ) 2 Exchange move from (θ, y), (θ, y ) to (θ, y), (θ, y ) with probability α = min 1, q(y θ) q(y θ) } {{ } π(θ ) h(θ θ ) π(θ) h(θ θ) q(y θ ) q(y θ ) } {{ } z(θ)z(θ ) z(θ)z(θ ) } {{ } 1 Exchange move proposes to offer the data y the auxiliary θ and similarly to offer the auxiliary data y the parameter θ The affinity between θ and y is measured by (**) and the affinity between θ and y by (*)
42 Evidence estimation: doubly intractable distributions Exchange algorithm Exchange algorithm for the Ising model The term ( α = min 1, π(θ ) π(θ) exp { (θ θ ) t (s(y ) s(y)) } ) exp { (θ θ ) t (s(y ) s(y)) } can be viewed as a measure of distance between the observed data y and the auxiliary data y. It is somewhat similar to the accept/reject step in ABC (approximate Bayesian computation). Note: If θ θ, then α 1. This does not necessarily happen with ABC.
43 Evidence estimation: doubly intractable distributions Exchange algorithm Exchange algorithm for the Ising model The main difficulty is the need to draw an exact sample y π( θ ) Perfect sampling is an obvious approach. A pragmatic alternative is to take a realisation from a long MCMC run with stationary distribution π(y θ ) as an approximate draw.
44 Evidence estimation: doubly intractable distributions Ising model Simulation study: Ising model Data y simulated from an Ising model defined on a lattice, with a single interaction parameter θ. Two competing models: 4 and 8 nearest neighbours. Here the lattices are sufficently small to allow a very accurate estimate of the Bayes factor: The normalising constant z(θ) can be calculated exactly for a grid of {θ i } values, which can then be plugged into the right hand side of: π(θ i y) q(y θ i) z(θ i ) π(θ i), i = 1,..., n. Summing up the right hand side yields an estimate of π(y). This serves as a groundtruth to compare with the corresponding MCMC-based estimate of the model evidence.
45 Evidence estimation: doubly intractable distributions Ising model Results: Ising model θ BF ˆ BF
46 Evidence estimation: doubly intractable distributions Exponential random graph models Friendships in a karate club in a US university.
47 Evidence estimation: doubly intractable distributions Exponential random graph models High school dating
48 Evidence estimation: doubly intractable distributions Exponential random graph models The exponential random graph (or p ) model First proposed by Frank and Strauss (JASA, 1986). Let y ij = 1 denote an edge connecting nodes i and j, and 0, otherwise. Data y is an adjacency matrix indicating nodes which are connected by an edge. 1. Edges y ij and y kl are neighbours of one another, if they share a common node. 2. If y ij and y kl are not neighbours, then y ij and y ij are conditionally independent, given the rest of the graph.
49 Evidence estimation: doubly intractable distributions Exponential random graph models The exponential random graph (or p ) model First proposed by Frank and Strauss (JASA, 1986). Let y ij = 1 denote an edge connecting nodes i and j, and 0, otherwise. Data y is an adjacency matrix indicating nodes which are connected by an edge. 1. Edges y ij and y kl are neighbours of one another, if they share a common node. 2. If y ij and y kl are not neighbours, then y ij and y ij are conditionally independent, given the rest of the graph.
50 Evidence estimation: doubly intractable distributions Exponential random graph models The p model π(y θ) = exp{θt s(y)} z(θ) = q(y θ) z(θ) y observed graph s(y) known vector of sufficient statistics θ vector of parameters z(θ) normalizing constant z(θ) = exp{θ t s(y)} all possible graphs 2 (n 2) possible undirected graphs of n nodes Calculation of z(θ) is infeasible for non-trivially small graphs
51 Evidence estimation: doubly intractable distributions Exponential random graph models The p model π(y θ) = exp{θt s(y)} z(θ) = q(y θ) z(θ) y observed graph s(y) known vector of sufficient statistics θ vector of parameters z(θ) normalizing constant z(θ) = exp{θ t s(y)} all possible graphs 2 (n 2) possible undirected graphs of n nodes Calculation of z(θ) is infeasible for non-trivially small graphs
52 Evidence estimation: doubly intractable distributions Exponential random graph models Model Specification: Network Statistics (a) edge mutual edge 2-in-star 2-out-star 2-mixed-star transitive triad cyclic triad (b) edge 2-star 3-star triangle
53 Evidence estimation: doubly intractable distributions Exponential random graph models ERGM: Florentine network Model 1: Model 2: Model 3: y edges + 3-star y edges + 2-star y edges + 2-star + 3-star
54 Evidence estimation: doubly intractable distributions Exponential random graph models ERGM: Florentine network Here it is difficult to establish a groundtruth. For this purpose, we ran an independence RJMCMC sampler: 1. Sample from each model, separately, using the exchange algorithm. (Here used the Bergm package of Caimo and NF (2011)). 2. RJMCMC: Use the posterior mean and variance for model k, as proposal parameters when proposing to jump to model k. This works well, since the model space is small, but also because each posterior model is unimodal. Acceptance rates for the jump proposals were around 40%, suggesting that the proposal distributions were a good fit to each posterior model. This is essentially the AutoRJ approach outlined in Chapter 6 of Green (2003).
55 Evidence estimation: doubly intractable distributions Exponential random graph models ERGM: Florentine network Here it is difficult to establish a groundtruth. For this purpose, we ran an independence RJMCMC sampler: 1. Sample from each model, separately, using the exchange algorithm. (Here used the Bergm package of Caimo and NF (2011)). 2. RJMCMC: Use the posterior mean and variance for model k, as proposal parameters when proposing to jump to model k. This works well, since the model space is small, but also because each posterior model is unimodal. Acceptance rates for the jump proposals were around 40%, suggesting that the proposal distributions were a good fit to each posterior model. This is essentially the AutoRJ approach outlined in Chapter 6 of Green (2003).
56 Evidence estimation: doubly intractable distributions Exponential random graph models ERGM: Florentine network Here estimates of posterior model probabilities based on AutoRJ are compared to those based on estimates of the model evidence for each model. π(m 1 y) π(m 2 y) π(m 3 y) AutoRJ Model evidence based
57 Evidence estimation: doubly intractable distributions Summary Concluding remarks Model evidence is difficult to compute! Often complex Monte Carlo methods are needed. There are plenty of methods in the Bayesian toolbox. A quick solution is not necessarily the best one!
58 Evidence estimation: doubly intractable distributions Summary References Chib, S. (1995) Marginal likelihood using Gibbs output. Journal of the American Statistical Association, 90, Friel, N and Pettitt, AN (2008) Marginal likelihood via power posteriors. Journal of the Royal Statistical Society, Series B, 70, Newton MA and Raftery, AE (1994) Approximate Bayesian inference by the weighted likelihood bootstrap (with Discussion). Journal of the Royal Statistical Society, Series B, 56, Neal, R (2001) Annealed importance sampling. Statistics and Computing, 11, Murray I., Ghahramani, Z., and MacKay, D. (2006) MCMC for doubly-intractable distributions. In Proceedings of the 22nd annual conference on uncertainty in artificial intelligence Ciamo A., Friel N. (2011) Bayesian inference for the exponential random graph model. Social Networks, 33, Skilling, J. (2006) Nested sampling for general Bayesian computation Bayesian Analysis, 1,
Gaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationTutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More information1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
More informationMaximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
More informationMarkov Chain Monte Carlo Simulation Made Simple
Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical
More informationPrinciple of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
More informationInference on Phase-type Models via MCMC
Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable
More informationBayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
More informationA Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data
A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing
More informationComputational Statistics for Big Data
Lancaster University Computational Statistics for Big Data Author: 1 Supervisors: Paul Fearnhead 1 Emily Fox 2 1 Lancaster University 2 The University of Washington September 1, 2015 Abstract The amount
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationIntroduction to Markov Chain Monte Carlo
Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem
More information11. Time series and dynamic linear models
11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationPart 2: One-parameter models
Part 2: One-parameter models Bernoilli/binomial models Return to iid Y 1,...,Y n Bin(1, θ). The sampling model/likelihood is p(y 1,...,y n θ) =θ P y i (1 θ) n P y i When combined with a prior p(θ), Bayes
More informationTime Series Analysis
Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:
More informationGaussian Conjugate Prior Cheat Sheet
Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian
More informationGeneral Sampling Methods
General Sampling Methods Reference: Glasserman, 2.2 and 2.3 Claudio Pacati academic year 2016 17 1 Inverse Transform Method Assume U U(0, 1) and let F be the cumulative distribution function of a distribution
More informationConstrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing
Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance
More informationExponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009
Exponential Random Graph Models for Social Network Analysis Danny Wyatt 590AI March 6, 2009 Traditional Social Network Analysis Covered by Eytan Traditional SNA uses descriptive statistics Path lengths
More informationIncorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care.
Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationExact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
More informationParameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker
Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem Lecture 12 04/08/2008 Sven Zenker Assignment no. 8 Correct setup of likelihood function One fixed set of observation
More informationSection 5. Stan for Big Data. Bob Carpenter. Columbia University
Section 5. Stan for Big Data Bob Carpenter Columbia University Part I Overview Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationBig data challenges for physics in the next decades
Big data challenges for physics in the next decades David W. Hogg Center for Cosmology and Particle Physics, New York University 2012 November 09 punchlines Huge data sets create new opportunities. they
More informationDealing with large datasets
Dealing with large datasets (by throwing away most of the data) Alan Heavens Institute for Astronomy, University of Edinburgh with Ben Panter, Rob Tweedie, Mark Bastin, Will Hossack, Keith McKellar, Trevor
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More informationModel-based Synthesis. Tony O Hagan
Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More informationIntroduction to Monte Carlo. Astro 542 Princeton University Shirley Ho
Introduction to Monte Carlo Astro 542 Princeton University Shirley Ho Agenda Monte Carlo -- definition, examples Sampling Methods (Rejection, Metropolis, Metropolis-Hasting, Exact Sampling) Markov Chains
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationDirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationProbabilistic Methods for Time-Series Analysis
Probabilistic Methods for Time-Series Analysis 2 Contents 1 Analysis of Changepoint Models 1 1.1 Introduction................................ 1 1.1.1 Model and Notation....................... 2 1.1.2 Example:
More informationSampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data
Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationA LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA
REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São
More informationModeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data
Modeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data Brian J. Smith, Ph.D. The University of Iowa Joint Statistical Meetings August 10,
More information1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
More informationModeling and Analysis of Call Center Arrival Data: A Bayesian Approach
Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science
More informationInference of Probability Distributions for Trust and Security applications
Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations
More informationMonte Carlo Simulation
1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: sweber@stochastik.uni-hannover.de web: www.stochastik.uni-hannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More informationCSC2420 Fall 2012: Algorithm Design, Analysis and Theory
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms
More informationLECTURE 4. Last time: Lecture outline
LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random
More informationI. GROUPS: BASIC DEFINITIONS AND EXAMPLES
I GROUPS: BASIC DEFINITIONS AND EXAMPLES Definition 1: An operation on a set G is a function : G G G Definition 2: A group is a set G which is equipped with an operation and a special element e G, called
More informationLeast-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More informationINDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem
More informationIEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem
IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem Time on my hands: Coin tosses. Problem Formulation: Suppose that I have
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More information5. Continuous Random Variables
5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationTriangle deletion. Ernie Croot. February 3, 2010
Triangle deletion Ernie Croot February 3, 2010 1 Introduction The purpose of this note is to give an intuitive outline of the triangle deletion theorem of Ruzsa and Szemerédi, which says that if G = (V,
More informationGenerating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010
Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte
More informationApproximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
More informationAn Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics
Slide 1 An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics Dr. Christian Asseburg Centre for Health Economics Part 1 Slide 2 Talk overview Foundations of Bayesian statistics
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationINDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationBayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
More informationSection 5.1 Continuous Random Variables: Introduction
Section 5. Continuous Random Variables: Introduction Not all random variables are discrete. For example:. Waiting times for anything (train, arrival of customer, production of mrna molecule from gene,
More informationOverview of Monte Carlo Simulation, Probability Review and Introduction to Matlab
Monte Carlo Simulation: IEOR E4703 Fall 2004 c 2004 by Martin Haugh Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab 1 Overview of Monte Carlo Simulation 1.1 Why use simulation?
More informationBayes and Naïve Bayes. cs534-machine Learning
Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More informationTHE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE
THE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE Batsirai Winmore Mazviona 1 Tafadzwa Chiduza 2 ABSTRACT In general insurance, companies need to use data on claims gathered from
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationminimal polyonomial Example
Minimal Polynomials Definition Let α be an element in GF(p e ). We call the monic polynomial of smallest degree which has coefficients in GF(p) and α as a root, the minimal polyonomial of α. Example: We
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationIncreasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.
1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.
More informationMATH2740: Environmental Statistics
MATH2740: Environmental Statistics Lecture 6: Distance Methods I February 10, 2016 Table of contents 1 Introduction Problem with quadrat data Distance methods 2 Point-object distances Poisson process case
More informationMonte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)
Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February
More informationPricing and calibration in local volatility models via fast quantization
Pricing and calibration in local volatility models via fast quantization Parma, 29 th January 2015. Joint work with Giorgia Callegaro and Martino Grasselli Quantization: a brief history Birth: back to
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationA Bayesian Model to Enhance Domestic Energy Consumption Forecast
Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3 6, 2012 A Bayesian Model to Enhance Domestic Energy Consumption Forecast Mohammad
More informationa 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
More informationThe Exponential Distribution
21 The Exponential Distribution From Discrete-Time to Continuous-Time: In Chapter 6 of the text we will be considering Markov processes in continuous time. In a sense, we already have a very good understanding
More informationDepartment of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x
More information1 Portfolio mean and variance
Copyright c 2005 by Karl Sigman Portfolio mean and variance Here we study the performance of a one-period investment X 0 > 0 (dollars) shared among several different assets. Our criterion for measuring
More informationFeb 28 Homework Solutions Math 151, Winter 2012. Chapter 6 Problems (pages 287-291)
Feb 8 Homework Solutions Math 5, Winter Chapter 6 Problems (pages 87-9) Problem 6 bin of 5 transistors is known to contain that are defective. The transistors are to be tested, one at a time, until the
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationComputing with Finite and Infinite Networks
Computing with Finite and Infinite Networks Ole Winther Theoretical Physics, Lund University Sölvegatan 14 A, S-223 62 Lund, Sweden winther@nimis.thep.lu.se Abstract Using statistical mechanics results,
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationSOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS
SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS Copyright 005 by the Society of Actuaries and the Casualty Actuarial Society
More informationProbability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce
More information1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
More informationMonte Carlo Methods in Finance
Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook October 2, 2012 Outline Introduction 1 Introduction
More informationCentre for Central Banking Studies
Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics
More informationBig Data, Statistics, and the Internet
Big Data, Statistics, and the Internet Steven L. Scott April, 4 Steve Scott (Google) Big Data, Statistics, and the Internet April, 4 / 39 Summary Big data live on more than one machine. Computing takes
More informationValidation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT
Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software
More informationSection 6.1 Joint Distribution Functions
Section 6.1 Joint Distribution Functions We often care about more than one random variable at a time. DEFINITION: For any two random variables X and Y the joint cumulative probability distribution function
More informationPREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE
PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE BY P.D. ENGLAND AND R.J. VERRALL ABSTRACT This paper extends the methods introduced in England & Verrall (00), and shows how predictive
More informationMarshall-Olkin distributions and portfolio credit risk
Marshall-Olkin distributions and portfolio credit risk Moderne Finanzmathematik und ihre Anwendungen für Banken und Versicherungen, Fraunhofer ITWM, Kaiserslautern, in Kooperation mit der TU München und
More information