Estimating the evidence for statistical models

Size: px
Start display at page:

Download "Estimating the evidence for statistical models"

Transcription

1 Estimating the evidence for statistical models Nial Friel University College Dublin March, 2011

2 Introduction Bayesian model choice Given data y and competing models: m 1,..., m l, each with parameters θ 1,..., θ l, respectively. Bayesian inference: π(θ k, m k y) π(y θ k, m k )π(θ k m k )p(m k )

3 Introduction Model evidence Within model m k : π(θ k y, m k ) π(y θ k, m k )π(θ k m k ) Constant of proportionality is π(y m k ) = π(y θ k, m k )π(θ k m k )dθ k. θ k This is often called the marginal likelihood, integrated likelihood or evidence and is difficult to compute in general.

4 Introduction Posterior model probabilities Suppose we could compute π(y m k ). Then, using Bayes theorem we get, π(m k y) = π(y m k)π(m k ) l 1 π(y m k)π(m k ).

5 Introduction Bayes factors If we have two competing models: π(m 1 y) π(m 2 y) = π(y m 1) π(y m 2 ) π(m 1) π(m 2 ) posterior odds = Bayes factor prior odds The Bayes factor, B 12 = π(y m 1) π(y m 2 ). The larger B 12 is, the greater the evidence in favour of M 1 compared to M 2.

6 Introduction Bayesian model averaging Predictions can be made by averaging over all models, weighted proportional to the posterior model probability, thereby incorporating model uncertainty. π(y y) = l π(y m k, y)π(m k y) k=1 This is the average of the posterior distribution for y under each model weighted by the corresponding posterior model probabilities.

7 Introduction Why estimating the model evidence is a challenge π(y m k ) is an integral of a (usually) highly variable function over a high-dimensional parameter space. Analytic tractability is sometimes possible, often where conjugate priors are used. This is quite rare. Consequently, sophisticated Monte Carlo methods are needed.

8 Introduction Within model search or across model search? Within model search: Inference for π(θ k y) separately for every m k. This is used to estimate π(y m k ), for all k. There are many approaches under this heading. Across model search: Here inference is carried out over the joint model and parameter space, π(θ k, m k y). In an MCMC setting, only one chain is needed! Reversible jump Markov chain Monte Carlo developed by Green (1995) is the dominant approach. (> 1, 400 citations to date...)

9 Review of evidence estimation Laplace s method Laplace s method (eg Tierney and Kadane 1986) Assume that π(θ k y) is highly peaked around the posterior mode θ k eg if sample size is large enough. Define l(θ k ) = log{π(y θ k )π(θ k )} Expand l(θ k ) as a quadratic about θ k and then exponentiate. Result gives an approximation to π(y θ k )π(θ k ) as a Gaussian with mean θ k and covariance Σ = ( D 2 l( θ k )) 1, where D 2 l( θ k ) is the Hessian matrix of second derivatives. Integrating this approximation yields π(y) (2π) d/2 Σ 1/2 π(y θ k )π( θ k ).

10 Review of evidence estimation Harmonic mean estimator Harmonic mean estimator (Newtown and Raftery (1994)) π(y) = 1/ Why does this hold? ( 1 n ) n π(y θ i ), θ i π(θ y). 1 { } 1 E π(θ y) π(θ y) = π(y θ)π(θ) π(θ y)π(y) dθ = 1 π(y) π(θ) dθ = 1 π(y). The bad news?

11 Review of evidence estimation Harmonic mean estimator Harmonic mean estimator (Newtown and Raftery (1994)) π(y) = 1/ Why does this hold? ( 1 n ) n π(y θ i ), θ i π(θ y). 1 { } 1 E π(θ y) π(θ y) = π(y θ)π(θ) π(θ y)π(y) dθ = 1 π(y) π(θ) dθ = 1 π(y). The bad news?

12 Review of evidence estimation Harmonic mean estimator Harmonic mean estimator (Newtown and Raftery (1994)) π(y) = 1/ Why does this hold? ( 1 n ) n π(y θ i ), θ i π(θ y). 1 { } 1 E π(θ y) π(θ y) = π(y θ)π(θ) π(θ y)π(y) dθ = 1 π(y) π(θ) dθ = 1 π(y). The bad news?

13 Review of evidence estimation Harmonic mean estimator Harmonic mean estimator (Newtown and Raftery (1994)) π(y) = 1/ ( 1 n ) n π(y θ i ), θ i π(θ y). 1 This estimator is based solely on draws from the posterior. But the posterior is typically much more peaked than the prior, eg, when the posterior is insensitive to the prior. Hence in such situations, the harmonic mean estimator will not change much as the prior changes. But π(y) is very sensitive to changes in the prior. This drawback is very well documented. See Radford Neal s blog, for example.

14 Review of evidence estimation Chib s method Chib s method (Chib 1995) Chib (1995) presented a generic method which can be applied to output from the Gibbs sampler. Re-writing this, So we could estimate log π(y) as π(θ y) = π(y θ)π(θ) π(y) π(y) = π(y θ)π(θ). π(θ y) log π(y) = log π(y θ ) + log π(θ ) log ˆπ(θ y) where ˆπ(θ y) is an estimate of the posterior density at a point θ of high posterior prob.

15 Review of evidence estimation Chib s method Chib s method (Chib 1995) Chib s method relies on estimating π(θ y). Suppose the vector θ can be partitioned as (θ 1, θ 2, θ 3 ), where the full-conditional distribution of each θ i is standard. π(θ y) = π(θ 1 θ 2, θ 3, y)π(θ 2 θ 3, y)π(θ 3 y) Gibbs sampling can be used to estimate each factor on the LHS: π(θ 2 θ 3, y) = 1 N π(θ 3 y) = 1 N j j π(θ 2 θ (j) 1, θ 3). π(θ 3 θ (j) 1, θ(j) 2 ).

16 Review of evidence estimation Chib s method Chib s method (Chib 1995) In general, Chib s method can be applied when θ is partitioned into an arbitrary number of blocks. The only requirement is that the full-conditional sampling of each block is possible.

17 Review of evidence estimation Annealed importance sampling Annealed Importance Sampling (Neal 2001) AIS is a very clever algorithm which shows how tempering can be used to define an importance samping function to sample from complex distributions. Aside: Importance sampling to sample from a target f (x) using an importance function g(x): x (1),..., x (N) g(x) Further, E f a(x) = w (i) a(x (i) ), where w (i) = f (x (i) ) w (i) g(x (i) ) 1 w (i) z f as N, N z g where z f = x f (x)dx and z g = x g(x)dx.

18 Review of evidence estimation Annealed importance sampling Annealed Importance Sampling (Neal 2001) Define π i (θ y) = π(θ) 1 t i π(θ y) t i, where 1 = t 0 > > t n = 0. Thus π t0 and π tn corresponds to posterior and prior, respectively. Let T i denote a Markov transition kernel with invariant π ti. For j = 1,..., N Sample θ n 1 from π tn Sample θ n 2 from θ n 1 using T n 1 Sample θ 0 from θ 1 using T 1. Set θ (j) = θ 0 and w (j) = π n 1(θ n 1 ) π n 2 (θ n 2 ) π n (θ n 1 ) π n 1 (θ n 2 )... π 0(θ 0 ) π 1 (θ 0 ).

19 Review of evidence estimation Annealed importance sampling Annealed Importance Sampling AIS yields: 1. An independent sample {θ (i) } from π(θ y). 2. An estimator of the evidence π(y) 1 n n w (i). i=1

20 Review of evidence estimation Power posteriors Evidence estimation via power posteriors (NF and Pettitt (2008)) Consider the Power posterior: π(θ y, t) {π(y θ)} T (t) p(θ) where T : [0, 1] [0, 1] is defined st T (0) = 0 and T (1) = 1. Its normalising constant is z(y t) = {π(y θ)} t p(θ) dθ. z(y t = 1): Posterior model evidence. z(y t = 0): Integral of the prior for θ, which equals 1. θ

21 Review of evidence estimation Power posteriors Evidence via power posteriors The evidence follows the identity: { } z(y t = 1) log π(y) = log = z(y t = 0) Proof: 1 d dt log(z(y t)) = 1 z(y t) z (y t) = = 1 z(y t) 0 E θ t log π(y θ)dt. d dt log(π(y θ))t π(θ)dθ log(π(y θ)) π(y θ)t π(θ)dθ z(y t) = E θ t log(π(y θ)).

22 Review of evidence estimation Power posteriors Evidence via power posteriors d dt log z(y t) = E θ t log(π(y θ)) This is the mean deviance wrt to (θ y, t) - the power posterior. Integrating wrt t yields, { } z(y t = 1) 1 log π(y) = log = E z(y t = 0) θ t log π(y θ)dt. This is essentially an application of thermodynamic integration, which was first developed in the statistical physics community, and outlined in Gelman and Meng (1998). 0

23 Review of evidence estimation Power posteriors In practice: Discretise t [0, 1], 0 = t 0 < t 1,..., t n = 1. For each t i : Sample θ π(θ y, t) and estimate E i = E θ ti log π(y θ). π(y) = n ( ) (Ei 1 + E i ) (t i t i 1 ) 2 i=1

24 Review of evidence estimation Power posteriors Sensitivity of p(y) to the prior - toy example How does sensitivity to the prior impact on this method? Suppose y = {y i } iid N(θ, 1). A priori, θ N(m, v). Then the power posterior θ y, t N(m t, v t ), where m t = ntȳ + m/v nt + 1/v and v t = 1 nt + 1/v and E θ y,t log π(y θ) = log 2π n (y i ȳ) 2 n 2 i=1 (m ȳ) 2 (vmt + 1) 2 n 2 When t = 0 final term is nv/2. As v so too does E θ y,t. 1 (nt + 1/v).

25 Review of evidence estimation Power posteriors Expected deviance, under the distribution θ y, t plotted against t for prior variance equal to 10, 5, 1. As v increases, so too does the rate at which the mean deviance changes with t

26 Review of evidence estimation Power posteriors Connection to Fractional Bayes estimator The fraction z(y t = 1)/z(y t = a) where a is close to 0, is precisely the estimate of the marginal likelihood used in the Fractional Bayes estimate of the Bayes factor (O Hagan 95). π(y) z(y t = 1) z(y t = a) = = θ π(y θ)π(θ) dθ θ {π(y θ)}a π(θ) dθ 1 a E θ t log π(y θ)dt This method was proposed to compute Bayes factor with un-informative priors. Impropriety in π(θ) cancels above and below. Essentially a fraction a of the data is borrowed for the prior.

27 Review of evidence estimation Power posteriors Power posterior approach It is realitively straightforward to code/implement. It is a generic method. In some cases it can be implemented in WinBUGS. Choosing the temperature schedule is vital this is the weakness of this approach. Behrens, NF, Hurn (2011) offer some possibility in this direction.

28 Review of evidence estimation Nested sampling Nested sampling (Skilling, 2006) (For the moment (for ease of notation), let L(θ) = π(y θ).) π(y) = L(θ)π(θ) dθ = L(θ) dx, where dx = π(θ) dθ is an element of prior mass. Define X (λ) = π(θ) dθ L(θ)>λ as a cumulant prior mass. Write the inverse function as L(X ), ie L(X (λ)) = λ. This then allows us to express the evidence as a 1 dimensional integral: π(y) = 1 0 L(X ) dx.

29 Review of evidence estimation Nested sampling Nested sampling (Skilling, 2006) (For the moment (for ease of notation), let L(θ) = π(y θ).) π(y) = L(θ)π(θ) dθ = L(θ) dx, where dx = π(θ) dθ is an element of prior mass. Define X (λ) = π(θ) dθ L(θ)>λ as a cumulant prior mass. Write the inverse function as L(X ), ie L(X (λ)) = λ. This then allows us to express the evidence as a 1 dimensional integral: π(y) = 1 0 L(X ) dx.

30 Review of evidence estimation Nested sampling Nested sampling (Skilling, 2006) (For the moment (for ease of notation), let L(θ) = π(y θ).) π(y) = L(θ)π(θ) dθ = L(θ) dx, where dx = π(θ) dθ is an element of prior mass. Define X (λ) = π(θ) dθ L(θ)>λ as a cumulant prior mass. Write the inverse function as L(X ), ie L(X (λ)) = λ. This then allows us to express the evidence as a 1 dimensional integral: π(y) = 1 0 L(X ) dx.

31 Review of evidence estimation Nested sampling Nested sampling The main computational burden is the requirement to sample θ from the prior subject to the constraint that L(θ) > l. This is roughly similar to the computational effort of slice sampling (Neal, 2003). The evidence is estimated by sorting draws from the prior according to their likelihood. I 1 π(y) = Z = (X i X i+1 )L i. i=1

32 Review of evidence estimation Nested sampling Sketch of algorithm Sample θ 1,..., θ N from the prior. Repeat for i = 1,..., I : Find the point θ k with the smallest likelihood, l i, among the N current θ i s. Set X i = exp(i/n) and w i = X i 1 X i. Increment Z by L i w i. Replace θ k with a point sampled from the prior subject to L(θ) > l i.

33 Evidence estimation: doubly intractable distributions Doubly intractable distributions π(θ y) π(y θ)π(θ) Here we assume that the likelihood, π(y θ), is impossible to evaluate.

34 Evidence estimation: doubly intractable distributions Ising model Doubly intractable distributions Gibbs random fields, which find use in spatial statistics and statistical network analysis, involves intractable likelihood models. Ising model Defined on a lattice y = {y 1,..., y n }. Lattice points y i take values { 1, 1}. Full conditional π(y i y i, θ) = π(y i neighbours of i, θ). 1 π(y θ) q(y θ) = exp 2 θ 1 y i y j. Here means is a neighbour of. i j

35 Evidence estimation: doubly intractable distributions Ising model 1st order and 2nd order Ising models. π(y θ) = exp(θt s(y)) z(θ) s(y) is a sufficient statistics and counts the number of like neighbours. z(θ) = x 1 q(y θ). x n

36 Evidence estimation: doubly intractable distributions Ising model Model evidence for MRFs our approach π(y) = q(y θ)π(θ) z(θ)π(θ y) θ. Draw from the posterior, and estimate π(θ y) for a high probability θ. Estimate z(θ) using thermodynamic integration.

37 Evidence estimation: doubly intractable distributions Simulating from the posterior Auxiliary variable method (Møller et al., 2006) Introduce an auxiliary variable y on the same space as the data y and extend the target distribution π(θ, y y) π(y θ)π(θ)π(y θ 0 ), for some fixed θ 0. Joint update (θ, y ) with proposal: h(θ, y θ, y ) = h 1 (y θ )h 2 (θ θ, y ) where h 1 (y θ ) = π(y θ ) = q(y θ ) z(θ. )

38 Evidence estimation: doubly intractable distributions Simulating from the posterior α(θ, y θ, y ) = π(y θ )π(θ )π(y θ 0 )π(y θ)h 2 (θ θ ) π(y θ)π(θ)π(y θ 0 )π(y θ )h 2 (θ θ) z(θ ) appears in π(y θ ) above and in π(y θ ) below, and therefore cancels. Similarly z(θ) cancels above and below. The choice of θ 0 is important. eg the maximum pseudolikelihood estimate based on y.

39 Evidence estimation: doubly intractable distributions Exchange algorithm Exchange algorithm (Murray, Ghahramani & MacKay 2006) Sample from an augmented distribution π(θ, y, θ y) π(y θ)π(θ)h(θ θ)π(y θ ) whose marginal distribution for θ is the posterior of interest π(y θ ) is the same likelihood model on which y is defined. h(θ θ) arbitrary distribution for the augmented variable θ which might depend on θ (eg random walk distribution centred at θ)

40 Evidence estimation: doubly intractable distributions Exchange algorithm Exchange algorithm How it works 1 Gibbs update of (θ, y ) i Draw θ h( θ) ii Draw y π( θ ) 2 Exchange move from (θ, y), (θ, y ) to (θ, y), (θ, y ) with probability α = min 1, q(y θ) q(y θ) } {{ } π(θ ) h(θ θ ) π(θ) h(θ θ) q(y θ ) q(y θ ) } {{ } z(θ)z(θ ) z(θ)z(θ ) } {{ } 1 Exchange move proposes to offer the data y the auxiliary θ and similarly to offer the auxiliary data y the parameter θ The affinity between θ and y is measured by (**) and the affinity between θ and y by (*)

41 Evidence estimation: doubly intractable distributions Exchange algorithm Exchange algorithm How it works 1 Gibbs update of (θ, y ) i Draw θ h( θ) ii Draw y π( θ ) 2 Exchange move from (θ, y), (θ, y ) to (θ, y), (θ, y ) with probability α = min 1, q(y θ) q(y θ) } {{ } π(θ ) h(θ θ ) π(θ) h(θ θ) q(y θ ) q(y θ ) } {{ } z(θ)z(θ ) z(θ)z(θ ) } {{ } 1 Exchange move proposes to offer the data y the auxiliary θ and similarly to offer the auxiliary data y the parameter θ The affinity between θ and y is measured by (**) and the affinity between θ and y by (*)

42 Evidence estimation: doubly intractable distributions Exchange algorithm Exchange algorithm for the Ising model The term ( α = min 1, π(θ ) π(θ) exp { (θ θ ) t (s(y ) s(y)) } ) exp { (θ θ ) t (s(y ) s(y)) } can be viewed as a measure of distance between the observed data y and the auxiliary data y. It is somewhat similar to the accept/reject step in ABC (approximate Bayesian computation). Note: If θ θ, then α 1. This does not necessarily happen with ABC.

43 Evidence estimation: doubly intractable distributions Exchange algorithm Exchange algorithm for the Ising model The main difficulty is the need to draw an exact sample y π( θ ) Perfect sampling is an obvious approach. A pragmatic alternative is to take a realisation from a long MCMC run with stationary distribution π(y θ ) as an approximate draw.

44 Evidence estimation: doubly intractable distributions Ising model Simulation study: Ising model Data y simulated from an Ising model defined on a lattice, with a single interaction parameter θ. Two competing models: 4 and 8 nearest neighbours. Here the lattices are sufficently small to allow a very accurate estimate of the Bayes factor: The normalising constant z(θ) can be calculated exactly for a grid of {θ i } values, which can then be plugged into the right hand side of: π(θ i y) q(y θ i) z(θ i ) π(θ i), i = 1,..., n. Summing up the right hand side yields an estimate of π(y). This serves as a groundtruth to compare with the corresponding MCMC-based estimate of the model evidence.

45 Evidence estimation: doubly intractable distributions Ising model Results: Ising model θ BF ˆ BF

46 Evidence estimation: doubly intractable distributions Exponential random graph models Friendships in a karate club in a US university.

47 Evidence estimation: doubly intractable distributions Exponential random graph models High school dating

48 Evidence estimation: doubly intractable distributions Exponential random graph models The exponential random graph (or p ) model First proposed by Frank and Strauss (JASA, 1986). Let y ij = 1 denote an edge connecting nodes i and j, and 0, otherwise. Data y is an adjacency matrix indicating nodes which are connected by an edge. 1. Edges y ij and y kl are neighbours of one another, if they share a common node. 2. If y ij and y kl are not neighbours, then y ij and y ij are conditionally independent, given the rest of the graph.

49 Evidence estimation: doubly intractable distributions Exponential random graph models The exponential random graph (or p ) model First proposed by Frank and Strauss (JASA, 1986). Let y ij = 1 denote an edge connecting nodes i and j, and 0, otherwise. Data y is an adjacency matrix indicating nodes which are connected by an edge. 1. Edges y ij and y kl are neighbours of one another, if they share a common node. 2. If y ij and y kl are not neighbours, then y ij and y ij are conditionally independent, given the rest of the graph.

50 Evidence estimation: doubly intractable distributions Exponential random graph models The p model π(y θ) = exp{θt s(y)} z(θ) = q(y θ) z(θ) y observed graph s(y) known vector of sufficient statistics θ vector of parameters z(θ) normalizing constant z(θ) = exp{θ t s(y)} all possible graphs 2 (n 2) possible undirected graphs of n nodes Calculation of z(θ) is infeasible for non-trivially small graphs

51 Evidence estimation: doubly intractable distributions Exponential random graph models The p model π(y θ) = exp{θt s(y)} z(θ) = q(y θ) z(θ) y observed graph s(y) known vector of sufficient statistics θ vector of parameters z(θ) normalizing constant z(θ) = exp{θ t s(y)} all possible graphs 2 (n 2) possible undirected graphs of n nodes Calculation of z(θ) is infeasible for non-trivially small graphs

52 Evidence estimation: doubly intractable distributions Exponential random graph models Model Specification: Network Statistics (a) edge mutual edge 2-in-star 2-out-star 2-mixed-star transitive triad cyclic triad (b) edge 2-star 3-star triangle

53 Evidence estimation: doubly intractable distributions Exponential random graph models ERGM: Florentine network Model 1: Model 2: Model 3: y edges + 3-star y edges + 2-star y edges + 2-star + 3-star

54 Evidence estimation: doubly intractable distributions Exponential random graph models ERGM: Florentine network Here it is difficult to establish a groundtruth. For this purpose, we ran an independence RJMCMC sampler: 1. Sample from each model, separately, using the exchange algorithm. (Here used the Bergm package of Caimo and NF (2011)). 2. RJMCMC: Use the posterior mean and variance for model k, as proposal parameters when proposing to jump to model k. This works well, since the model space is small, but also because each posterior model is unimodal. Acceptance rates for the jump proposals were around 40%, suggesting that the proposal distributions were a good fit to each posterior model. This is essentially the AutoRJ approach outlined in Chapter 6 of Green (2003).

55 Evidence estimation: doubly intractable distributions Exponential random graph models ERGM: Florentine network Here it is difficult to establish a groundtruth. For this purpose, we ran an independence RJMCMC sampler: 1. Sample from each model, separately, using the exchange algorithm. (Here used the Bergm package of Caimo and NF (2011)). 2. RJMCMC: Use the posterior mean and variance for model k, as proposal parameters when proposing to jump to model k. This works well, since the model space is small, but also because each posterior model is unimodal. Acceptance rates for the jump proposals were around 40%, suggesting that the proposal distributions were a good fit to each posterior model. This is essentially the AutoRJ approach outlined in Chapter 6 of Green (2003).

56 Evidence estimation: doubly intractable distributions Exponential random graph models ERGM: Florentine network Here estimates of posterior model probabilities based on AutoRJ are compared to those based on estimates of the model evidence for each model. π(m 1 y) π(m 2 y) π(m 3 y) AutoRJ Model evidence based

57 Evidence estimation: doubly intractable distributions Summary Concluding remarks Model evidence is difficult to compute! Often complex Monte Carlo methods are needed. There are plenty of methods in the Bayesian toolbox. A quick solution is not necessarily the best one!

58 Evidence estimation: doubly intractable distributions Summary References Chib, S. (1995) Marginal likelihood using Gibbs output. Journal of the American Statistical Association, 90, Friel, N and Pettitt, AN (2008) Marginal likelihood via power posteriors. Journal of the Royal Statistical Society, Series B, 70, Newton MA and Raftery, AE (1994) Approximate Bayesian inference by the weighted likelihood bootstrap (with Discussion). Journal of the Royal Statistical Society, Series B, 56, Neal, R (2001) Annealed importance sampling. Statistics and Computing, 11, Murray I., Ghahramani, Z., and MacKay, D. (2006) MCMC for doubly-intractable distributions. In Proceedings of the 22nd annual conference on uncertainty in artificial intelligence Ciamo A., Friel N. (2011) Bayesian inference for the exponential random graph model. Social Networks, 33, Skilling, J. (2006) Nested sampling for general Bayesian computation Bayesian Analysis, 1,

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Gaussian Processes to Speed up Hamiltonian Monte Carlo Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

Inference on Phase-type Models via MCMC

Inference on Phase-type Models via MCMC Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable

More information

Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour. Patrick Lam Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

More information

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing

More information

Computational Statistics for Big Data

Computational Statistics for Big Data Lancaster University Computational Statistics for Big Data Author: 1 Supervisors: Paul Fearnhead 1 Emily Fox 2 1 Lancaster University 2 The University of Washington September 1, 2015 Abstract The amount

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Introduction to Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem

More information

11. Time series and dynamic linear models

11. Time series and dynamic linear models 11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd

More information

Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models Bernoilli/binomial models Return to iid Y 1,...,Y n Bin(1, θ). The sampling model/likelihood is p(y 1,...,y n θ) =θ P y i (1 θ) n P y i When combined with a prior p(θ), Bayes

More information

Time Series Analysis

Time Series Analysis Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:

More information

Gaussian Conjugate Prior Cheat Sheet

Gaussian Conjugate Prior Cheat Sheet Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian

More information

General Sampling Methods

General Sampling Methods General Sampling Methods Reference: Glasserman, 2.2 and 2.3 Claudio Pacati academic year 2016 17 1 Inverse Transform Method Assume U U(0, 1) and let F be the cumulative distribution function of a distribution

More information

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance

More information

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009 Exponential Random Graph Models for Social Network Analysis Danny Wyatt 590AI March 6, 2009 Traditional Social Network Analysis Covered by Eytan Traditional SNA uses descriptive statistics Path lengths

More information

Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care.

Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care. Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission

More information

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem Lecture 12 04/08/2008 Sven Zenker Assignment no. 8 Correct setup of likelihood function One fixed set of observation

More information

Section 5. Stan for Big Data. Bob Carpenter. Columbia University

Section 5. Stan for Big Data. Bob Carpenter. Columbia University Section 5. Stan for Big Data Bob Carpenter Columbia University Part I Overview Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Big data challenges for physics in the next decades

Big data challenges for physics in the next decades Big data challenges for physics in the next decades David W. Hogg Center for Cosmology and Particle Physics, New York University 2012 November 09 punchlines Huge data sets create new opportunities. they

More information

Dealing with large datasets

Dealing with large datasets Dealing with large datasets (by throwing away most of the data) Alan Heavens Institute for Astronomy, University of Edinburgh with Ben Panter, Rob Tweedie, Mark Bastin, Will Hossack, Keith McKellar, Trevor

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Model-based Synthesis. Tony O Hagan

Model-based Synthesis. Tony O Hagan Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

More information

Introduction to Monte Carlo. Astro 542 Princeton University Shirley Ho

Introduction to Monte Carlo. Astro 542 Princeton University Shirley Ho Introduction to Monte Carlo Astro 542 Princeton University Shirley Ho Agenda Monte Carlo -- definition, examples Sampling Methods (Rejection, Metropolis, Metropolis-Hasting, Exact Sampling) Markov Chains

More information

Towards running complex models on big data

Towards running complex models on big data Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation

More information

Dirichlet Processes A gentle tutorial

Dirichlet Processes A gentle tutorial Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

Probabilistic Methods for Time-Series Analysis

Probabilistic Methods for Time-Series Analysis Probabilistic Methods for Time-Series Analysis 2 Contents 1 Analysis of Changepoint Models 1 1.1 Introduction................................ 1 1.1.1 Model and Notation....................... 2 1.1.2 Example:

More information

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São

More information

Modeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data

Modeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data Modeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data Brian J. Smith, Ph.D. The University of Iowa Joint Statistical Meetings August 10,

More information

1 Introduction to Matrices

1 Introduction to Matrices 1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

More information

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science

More information

Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

More information

Monte Carlo Simulation

Monte Carlo Simulation 1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: sweber@stochastik.uni-hannover.de web: www.stochastik.uni-hannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

More information

LECTURE 4. Last time: Lecture outline

LECTURE 4. Last time: Lecture outline LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random

More information

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES I GROUPS: BASIC DEFINITIONS AND EXAMPLES Definition 1: An operation on a set G is a function : G G G Definition 2: A group is a set G which is equipped with an operation and a special element e G, called

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem Time on my hands: Coin tosses. Problem Formulation: Suppose that I have

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

5. Continuous Random Variables

5. Continuous Random Variables 5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Triangle deletion. Ernie Croot. February 3, 2010

Triangle deletion. Ernie Croot. February 3, 2010 Triangle deletion Ernie Croot February 3, 2010 1 Introduction The purpose of this note is to give an intuitive outline of the triangle deletion theorem of Ruzsa and Szemerédi, which says that if G = (V,

More information

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010 Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics Slide 1 An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics Dr. Christian Asseburg Centre for Health Economics Part 1 Slide 2 Talk overview Foundations of Bayesian statistics

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

Section 5.1 Continuous Random Variables: Introduction

Section 5.1 Continuous Random Variables: Introduction Section 5. Continuous Random Variables: Introduction Not all random variables are discrete. For example:. Waiting times for anything (train, arrival of customer, production of mrna molecule from gene,

More information

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab Monte Carlo Simulation: IEOR E4703 Fall 2004 c 2004 by Martin Haugh Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab 1 Overview of Monte Carlo Simulation 1.1 Why use simulation?

More information

Bayes and Naïve Bayes. cs534-machine Learning

Bayes and Naïve Bayes. cs534-machine Learning Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

More information

THE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE

THE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE THE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE Batsirai Winmore Mazviona 1 Tafadzwa Chiduza 2 ABSTRACT In general insurance, companies need to use data on claims gathered from

More information

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

More information

minimal polyonomial Example

minimal polyonomial Example Minimal Polynomials Definition Let α be an element in GF(p e ). We call the monic polynomial of smallest degree which has coefficients in GF(p) and α as a root, the minimal polyonomial of α. Example: We

More information

Classification Problems

Classification Problems Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all. 1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.

More information

MATH2740: Environmental Statistics

MATH2740: Environmental Statistics MATH2740: Environmental Statistics Lecture 6: Distance Methods I February 10, 2016 Table of contents 1 Introduction Problem with quadrat data Distance methods 2 Point-object distances Poisson process case

More information

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February

More information

Pricing and calibration in local volatility models via fast quantization

Pricing and calibration in local volatility models via fast quantization Pricing and calibration in local volatility models via fast quantization Parma, 29 th January 2015. Joint work with Giorgia Callegaro and Martino Grasselli Quantization: a brief history Birth: back to

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

A Bayesian Model to Enhance Domestic Energy Consumption Forecast

A Bayesian Model to Enhance Domestic Energy Consumption Forecast Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3 6, 2012 A Bayesian Model to Enhance Domestic Energy Consumption Forecast Mohammad

More information

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2. Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given

More information

The Exponential Distribution

The Exponential Distribution 21 The Exponential Distribution From Discrete-Time to Continuous-Time: In Chapter 6 of the text we will be considering Markov processes in continuous time. In a sense, we already have a very good understanding

More information

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015. Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

More information

1 Portfolio mean and variance

1 Portfolio mean and variance Copyright c 2005 by Karl Sigman Portfolio mean and variance Here we study the performance of a one-period investment X 0 > 0 (dollars) shared among several different assets. Our criterion for measuring

More information

Feb 28 Homework Solutions Math 151, Winter 2012. Chapter 6 Problems (pages 287-291)

Feb 28 Homework Solutions Math 151, Winter 2012. Chapter 6 Problems (pages 287-291) Feb 8 Homework Solutions Math 5, Winter Chapter 6 Problems (pages 87-9) Problem 6 bin of 5 transistors is known to contain that are defective. The transistors are to be tested, one at a time, until the

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

Computing with Finite and Infinite Networks

Computing with Finite and Infinite Networks Computing with Finite and Infinite Networks Ole Winther Theoretical Physics, Lund University Sölvegatan 14 A, S-223 62 Lund, Sweden winther@nimis.thep.lu.se Abstract Using statistical mechanics results,

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS Copyright 005 by the Society of Actuaries and the Casualty Actuarial Society

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

1 Norms and Vector Spaces

1 Norms and Vector Spaces 008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

More information

Monte Carlo Methods in Finance

Monte Carlo Methods in Finance Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook October 2, 2012 Outline Introduction 1 Introduction

More information

Centre for Central Banking Studies

Centre for Central Banking Studies Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

More information

Big Data, Statistics, and the Internet

Big Data, Statistics, and the Internet Big Data, Statistics, and the Internet Steven L. Scott April, 4 Steve Scott (Google) Big Data, Statistics, and the Internet April, 4 / 39 Summary Big data live on more than one machine. Computing takes

More information

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software

More information

Section 6.1 Joint Distribution Functions

Section 6.1 Joint Distribution Functions Section 6.1 Joint Distribution Functions We often care about more than one random variable at a time. DEFINITION: For any two random variables X and Y the joint cumulative probability distribution function

More information

PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE

PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE BY P.D. ENGLAND AND R.J. VERRALL ABSTRACT This paper extends the methods introduced in England & Verrall (00), and shows how predictive

More information

Marshall-Olkin distributions and portfolio credit risk

Marshall-Olkin distributions and portfolio credit risk Marshall-Olkin distributions and portfolio credit risk Moderne Finanzmathematik und ihre Anwendungen für Banken und Versicherungen, Fraunhofer ITWM, Kaiserslautern, in Kooperation mit der TU München und

More information