Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Size: px
Start display at page:

Download "Spatial Statistics Chapter 3 Basics of areal data and areal data modeling"

Transcription

1 Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data Y 1,..., Y n observed on a set of geographical units (over a map), the pixels of an image or a regular arrangements of points on a lattice. 1

2 Models for areal data are also sometimes employed for irregularly arranged point-referenced data sets when the number of spatial units is very large computational considerations.

3 As we shall see in Chapter 5, certain types of areal models are computationally easier to work with and ideal for use with Gibbs sampler. In this setting, unlike the geostatistical one, we are typically not interested in prediction and have observed data at all spatial sites. What is of interest in this setting? Spatial pattern evident? Are there clusters of high/low values?

4 Smoothing: Filter out some of the noise in the data help elucidate spatial pattern. Deciding how much to smooth the data is not always clear. Smoother maps are easier to interpret but will generally not represent the data well and vice versa. Example: No smoothing at all is equivalent to presenting a raw map of the data. Extreme smoothing would involve associating the same value Ȳ with all units. Optimal smoothing lies somewhere between these two extremes.

5 Also of interest in this setting is relating the response to covariates through regression models need to account for spatial dependence in such regression models. Also in the regression setting, we would be interested in examining the residual spatial structure after accounting for covariates. Exploratory methods for areal data Recall the primary source of spatial information in the areal setting consists of adjacencies knowing, for each region, all the neighboring regions (for some appropriate definition of neighbor). i.e.the arrangement of the regions across the map.

6 This adjacency structure is quantified through the neighborhood (or proximity) matrix W: W ij = 0 if i = j 0 if i and j are not neighbors c ij > 0 if i and j are neighbors c ij quantifies the strength of the neighbor relationship. Most often c ij = 1 for all neighbor pairs and two regions are considered neighbors if they share a common boundary. It is instructive to think of this spatial structure as a graph, where nodes correspond to regions and two nodes on the graph are connected if the associated regions are neighbors.

7 The neighborhood matrix W can be used for exploratory analysis and will also be used when we discuss models for areal data. Note that it is also possible to define 2 nd order neighbors and to have a corresponding 2 nd order neighborhood matrix. After simply plotting data (usually on a map in this case) an exploratory analysis usually proceeds with an attempt to quantify the strength of spatial association in the data.

8 For this, two statistics can be employed: 1. Morn s I: I = n i j w ij (Y i Ȳ )(Y j Ȳ ) ( i j w ij ) i(y i Ȳ ) 2 where I 0 no spatial dependence I > 0 positive spatial dependence I < 0 negative spatial dependence Can be thought of as an areal correlation coefficient.

9 2. Geary s C: C = (n 1) i j w ij (Y i Y j ) 2 ( i j w ij ) i(y i Ȳ ) 2 where C 0 C 1 no spatial dependence C < 1 positive spatial dependence C > 1 negative spatial dependence Under the hypothesis that the Y i s are iid, one can show that the asymptotic distributions of both statistics are normal and that E[I] = 0; E[C] = 1

10 Using these asymptotic distributions one can easily construct hypothesis test of H 0 : E[I] = 0 against either a one or two-sided alternative. Another, perhaps preferable, way to test for association is to use a Monte Carlo test for independence. Idea: Under the assumption that the Y i s are iid, the distribution of I (and C) is invariant to permutations of the Y i s. What does this mean?

11 The distribution of I clearly depends on W; however, if the spatial structure has no role to play then permuting the rows of W will not change the distribution of I. So [I W] [I W ] where W is any row permutation of W. To calculate a Monte Carlo test for spatial association, we randomly permute the data vector Y (equivalent to permuting the rows of W) and calculate the value the new value say, I (1). Repeat this procedure many times, say, n = 999: I (1), I (2),..., I (999) and plot the histogram of these values. We then locate the original observed value I (obs) on this histogram.

12 Under the assumption that the Y i s are iid, the observed value I (obs) comes from the same distribution as I (1), I (2),..., I (999) I (obs) should lie somewhere in the main body of the histogram. If I (obs) lies in the tails of the histogram, we have evidence against the hypothesis that the Y i s are iid. Can quantify this by calculating an empirical p-value. If associated with each Y i is a vector of covariates x i, then even if the Y i s are spatially dependent they may not be identically distributed.

13 As in the point referenced setting, this suggests applying these techniques to the estimated residuals from standard regression models. Simple Smoothing To filter out noise in the data and produce a smooth map we can use the W matrix and replace each Y i with Ŷ i = w ij Y j ; w i+ = w ij j w i+ j a weighted average that will encourage the smoothed Y i to be similar to its neighbors. Problems with this? A possible remedy is Yˆ i for α [0, 1]. = (1 α)y i + αŷ i

14 Here, α = 0 yields the raw data and α = 1 yields a very smooth map. Try different values of α in an exploratory fashion. In Chapter 5 we will discuss hierarchical models for smoothing which will incorporate covariate information and spatial random effects. In that setting our smoothed Y i s will be posterior means E[Y i Data].

15 Markov Random Fields In the point-referenced data setting we specified the joint distribution of the observed data Y 1,..., Y n directly. In the areal setting, where we have Y 1,..., Y n and a neighborhood matrix W we will take a different approach and build the required joint distributions f(y 1,..., y n ) through the specification of a set of simpler full conditional distributions f(y i y j, j i), i = 1,..., n. For a given joint distribution f(y 1,..., y n ) we can always obtain unique and well defined conditional distributions f(y 1,..., y n ) = f(y 1,..., y n ) f(y1,..., y n )dy j

16 But note that the converse is not always true! We can not simply write down a set of full conditional distributions f(y i y j, j i), i = 1,..., n and claim that these determine a unique f(y 1,..., y n ). Consider two random variables with Y 1 Y 2 N(α 0 + α 1 Y 2, σ1 2 ) and Y 2 Y 1 N(β 0 + β 1 Y1 3, σ2 2 )

17 In this case E[Y 1 ] = E[E[Y 1 Y 2 ]] = E[α 0 + α 1 Y 2 ] = α 0 + α 1 E[Y 2 ] E[Y 2 ] is a linear function of E[Y 1 ] But we also have E[Y 2 ] = E[E[Y 2 Y 1 ]] = E[β 0 + β 1 Y 3 1 ] β 0 + β 1 E[Y 3 1 ] Both conditions can not hold (except in trivial cases) and so here the two conditional distributions do not determine a valid and unique joint distribution.

18 In general when a set of full conditional distributions determine a unique and valid joint distribution we say that the set of conditional distributions is compatible. Improper distribution: An improper distribution is a distribution with non-integrable density. That is, if S is the sample space of Y then S f(y)dy = When would such an object be useful in statistics? Clearly, an improper distribution is not useful as a model for data. In Bayesian statistics, where parameters are assigned probability distributions, improper distributions may be employed as priors. How?

19 Even though the prior density π(θ) is such that π(θ)dθ = having observed data y (assumed to arise from a proper distribution) the corresponding posterior may be proper π(θ y)dθ < and so inference based on this posterior is valid. Such distributions have their uses in Bayesian statistics and in fact are used, as we shall see later, as models for random effects in an areal data setting.

20 Given a set of compatible and proper full conditional distributions f(y i y j, j i), i = 1,..., n, the resulting joint distribution can be improper! Example: consider the bivariate joint distribution with f(y 1, y 2 ) exp[ 1 2 (y 1 y 2 ) 2 ], (y 1, y 2 ) R 2 This density has no valid normalizing constant since exp[ 1 2 (y 1 y 2 ) 2 ]dy 1 dy 2 = and so the distribution is improper. What about the corresponding full conditional distributions?

21 Clearly [Y 1 Y 2 = y 2 ] N(y 2, 1) and [Y 2 Y 1 = y 1 ] N(y 1, 1) so here we have an example of two compatible and proper full conditional distributions that yield an improper joint distribution. If we have a set of compatible full conditional distributions f(y i y j, j i), i = 1,..., n, how can we determine the form of the resulting joint distribution f(y 1,..., y n )? Brook s Lemma

22 Brook s Lemma notes that if {f(y i y j ), j i), i = 1,..., n} is a set of compatible full conditional distributions and y 0 = (y 10,..., y no ) is any fixed point in the support of f(y 1,..., y n ) then f(y 1,..., y n ) = f(y 1 y 2,..., y n ) f(y 10 y 2,..., y n ) f(y 2 y 10, y 3..., y n ) f(y 20 y 10, y 3,..., y n ) f(y n y 10,..., y n 1,0 ) f(y n0 y 10,..., y n 1,0 ) f(y 10,..., y n0 ) This gives us the joint distribution up to a normalizing constant. If f(y 1,..., y n ) is proper, then the fact that it integrates to 1 determines the normalizing constant. How should we specify the full conditional distributions so that (1) they are compatible and (2) they are simple enough and yet yield useful spatial structure?

23 We will not worry about (1). To address (2) we will assume that the full conditional distribution of Y i depends only on its neighbors. That is, the full conditional distribution of Y i will depend only on those Y j s that have W ij 0. Letting i = {j W ij 0} denote the set of neighbors for region i (i j W ij 0) this implies f(y i y j, j i) = f(y i y j, j i ), i = 1,..., n

24 This sort of specification for the full conditional distributions, when compatible, is referred to as a Markov random field (MRF) due to the obvious Markovian structure of the full conditional distributions. The idea behind such models is the development of a complicated spatial dependence structure through a set of simple local specifications that depend only on lattice (or map) adjacencies. We will develop and employ these sorts of models as models for areal data or as models for random effects in an areal setting. Clique: A clique is a set of cells (or indices) such that each element in the set is a neighbor of every other element in the set.

25 Think of the graph representation of the neighborhood structure mentioned earlier. A clique represents a set of nodes M on the graph such the each pair of indices (i, j) with both i and j in M represents an edge of the graph. With n spatial units, we can have cliques of size 1,..., n. Potential function: A potential of order k is a function of k arguments that is exchangeable in its arguments. A potential function of order k typically operates on the variable values y s1,..., y sk associated with a clique {s 1,..., s k } of size k.

26 Examples k = 2 1. y i y j 2. (y i y j ) 2 3. y i y j + (1 y i )(1 y j ) for binary data Gibbs Distribution: A joint distribution for Y 1,..., Y n is a Gibbs distribution if the joint density/pmf f(y 1,..., y n ) takes the following form f(y 1,..., y n ) exp{γ k α M k φ (k) (y α1,..., y αk )} Where φ (k) ( ) is a potential of order k, M k is the collection of all cliques of size k and γ > 0 is a parameter.

27 The joint distribution f(y 1,..., y n ) depends on y 1,..., y n only through potential functions evaluated over the cliques induced by the neighborhood (graph) structure. Note such a distribution may have more than one parameter the potential functions may depend on unknown parameters.

28 Hammersley-Clifford Theorem: If we have a MRF then the corresponding joint distribution is a Gibbs distribution. Only Cliques of order 1 independence - consider the form of the corresponding Gibbs distribution. Distributions having Cliques of order 2 are most common. An example is the pairwise difference form f(y 1,..., y n ) exp{ 1 2τ 2 (y i y j ) 2 } based on quadratic potential functions. i,j

29 Conditionally autoregressive (CAR) models Particularly popular class of MRF models introduced by J. Besag in These models have become very popular within the last decade, particulary since the advent of Gibbs sampling. Gibbs sampling is a procedure for simulating realizations from a joint distribution f(y 1,..., y n ) using only the full conditional distributions {f(y i y j, j i), i = 1,..., n}.

30 Useful in Bayesian statistics when we want to draw samples from a posterior distribution of interest. MRF models are ideal in this setting since they are specified in terms of full conditional distributions. More on this later...

31 Autonormal (Gaussian) CAR models Here we begin with the full conditionals [Y i y j, j i] N( j b ij y j, τ 2 i ), i = 1,..., n For appropriately chosen b ij these full conditionals are compatible, so using Brook s lemma we can obtain the joint distribution as f(y 1,..., y n ) exp{ 1 2 y D 1 (I B)y} where B = (b ij ) and D = diag{τ 2 1,..., τ 2 n } Looks like a multivariate normal distribution with µ = 0 and Σ 1 y = D 1 (I B).

32 This is of course only true if D 1 (I B) is symmetric. We must choose b ij in the conditional Gaussian distributions to ensure this symmetry. In particular, choosing b ij so that b ij τ 2 i = b ji τj 2, for all i, j will ensure symmetry (and compatibility). Notice that if τ 2 i τ 2 j then we can not have b ij = b ji. How to choose the b ij s subject to the above constraints? and also, to yield a reasonable joint spatial distribution?

33 We will take the b ij s to be functions of the neighborhood matrix W b ij = w ij w i+, τ 2 i = τ 2 w i+ Does this specification satisfy the symmetry condition? With these choices the full conditional distributions are [Y i y j, j i] N( j w ij w i+ y j, τ 2 w i+ ), i = 1,..., n Interpretation?

34 The joint distribution for these choices of b ij and τ i is f(y 1,..., y n ) exp{ 1 2τ 2y (D W W)y} where D W = diag{w 1+,..., w n+ }. This is again MVN with µ = 0 and Σ 1 y = (D W W) Note here that (D W W)1 = 0 Σ 1 y is singular! This is a singular MVN distribution an improper distribution no valid normalizing constant

35 Such a distribution is often referred to as a Gaussian intrinsic autoregression. To further investigate this impropriety we can rewrite the joint distribution as f(y 1,..., y n ) exp{ 1 2τ 2 i,j w ij (y i y j ) 2 } a pairwise difference Gibbs distribution with quadratic potentials. What happens to this distribution if I add a constant µ to all the Y i? nothing the Y i s are not centered. This distribution does not identify an overall mean.

36 To provide the required centering we can impose a constraint Yi = 0 Problems with this as a model for data? Can not expect our data to respect this constraint... This constrained improper distribution can not be used as a model for data, but can be used as a model for spatial random effects (a prior for parameters that vary spatially). Perhaps explain this in the context of a map...

37 If we want to use the autonormal model as a distribution for data (as opposed to a prior for spatial random effects) we need an alternative solutions to the impropriety problem. We have (D W W)1 = 0 causing unfortunate results. An obvious remedy is to incorporate a constant ρ so that Σ 1 y is non-singular. = (D W ρw) Such models are often referred to as proper CAR models.

38 How to choose ρ to ensure non-singularity? Such non-singularity is guaranteed provided ρ ( λ 1, 1 (1) λ ) where λ (n) (1) < λ (2) < λ (n) are the ordered eigenvalues of D 1 2 w WD 1 2 w. It is also possible to show λ (1) < 0 and λ (n) > 0 so that the interval ( λ 1, 1 (1) λ ) contains (n) 0. How to choose ρ?

39 Leave ρ ( λ 1, 1 (1) λ ) unspecified as a parameter in our (n) model. One usually adopts the simple choice ρ [0, 1) when λ (n) = 1. Here ρ = 0 corresponds to conditional distributions [Y i y j, j i] N(0, spatial independence. τ 2 w i+ ), i = 1,..., n Further ρ 1 corresponds to the IAR model and larger values of ρ imply a greater degree of spatial dependence.

40 Note with the IAR model (ρ = 1) we only have one parameter τ 2 - the variance component. This variance component does not quantify spatial dependence in any way. With the IAR model, much of the spatial structure imposed by the model is preimplied by the chosen W. Note also that independence does not arise as a special case of this model.

41 Of course one could, in principle, allow the neighborhood structure, W, itself to be a parameter in the model fairly complicated. When the more general CAR model incorporating ρ is employed, how does one interpret ρ? very carefully. In particular, ρ does not represent correlation. Rather, ρ is some measure of dependence in the sense that ρ = 0 corresponds to independence and spatial dependence increases with ρ. The maximum allowable spatial dependence corresponds to the IAR model when ρ = 1.

42 To calibrate ρ for a given neighborhood structure and map, one could simulate realizations from the CAR model for different values of ρ. For each realization we could compute Moran s I to get a strength of the spatial dependence implied by a particular ρ value.

43 In general, even moderate amounts of spatial dependence will require ρ > 0.9 and usually estimates of ρ are close to its upper bound value. When modeling random effects in an areal data setting, I usually fit models based on the proper CAR model as well as the IAR model and then compare the two using some model selection tool. Usually, at least in my experience, the IAR model ends up being the preferred model.

44 I note again that in the framework of this model we specify a joint normal distribution for the data and specify the inverse covariance matrix Σ 1 y = (D W ρw) but in general have no simple form for the covariance matrix. The elements of Σ y give us, of course, information on the marginal covariance structure of Y. The elements of Σ 1 y give us information on the conditional covariance structure of Y. For example, using standard results associated with the MVN distribution, we can show that 1/(Σ 1 y ) ii gives us V AR(Y i y j, j i).

45 Moreover, if (Σ 1 y ) ij = 0 then Y i and Y j are conditionally independent given {y k, k i, j}. We see that W ij = 0 implies conditional independence between Y i and Y j (given all other Y s). From this we see that the specification of a neighborhood structure W is essentially a set of conditional independence assumptions. Regression: If the proper CAR model is used as a distribution for data, we can accommodate covariates x i by modifying the conditional distributions to N(x i β + j w ij w i+ (y j x j β), τ 2 w i+ ), i = 1,..., n

46 With these conditional specifications the marginal distribution for Y is MVN with µ = Xβ and Σ 1 y = (D W ρw). We will mostly be concerned with the µ = 0 case when CAR models are applied as a (prior) distribution for random effects. Multivariate spatial data: Suppose, associated with each areal unit, we observe several, say p dependent observations Y i = (Y i1, Y i2..., Y ip ). Models for these sorts of data must account for the spatial dependence across areal units and also dependence within each Y i.

47 Multivariate conditional autoregressive models (MCAR) have been developed for such data. The idea is a straightforward extension of the univariate case where we specify the joint distribution of all np random variables Y = (Y 1,..., Y n ) through a set of full conditional distributions. These full conditional distributions will be p variate normal instead of univariate normal. Note also that a CAR model can, in principle, be adopted for model point referenced data by allowing the elements of W to depend on the distance between points.

48 This may be useful for very large datasets since CAR models, as we shall see in Chapter 5, are numerically less demanding to fit within a Gibbs sampling framework. When prediction is not of interest, this is a perfectly acceptable way of building a joint distribution. Whether or not such an approach yields an adequate representation of the underlying spatial structure in a given application is a model assessment issue - and a critical one at that.

49 Non-Gaussian CAR models When dealing with non-gaussian areal data, our preferred approach will be based on generalized linear mixed models, where we incorporate Gaussian CAR random effects into models for non-gaussian data Chapter 5. An alternative to this approach, which we consider now, is to adopt a MRF type specification for the data Y 1,..., Y n and determine a joint distribution through the specification of a set of compatible non-gaussian full conditional distributions.

50 For example, we can allow the full conditional distributions f(y i y j, j i) to take Poisson, binomial, Gamma or in fact any form from the exponential family. When these are compatible, the result is a joint spatial distribution for non-gaussian data. See Cressie (1993) for a full development of CAR models in a general framework. I will present two examples of such non- Gaussian CAR models and discuss the computational problems associated with these.

51 Binary Data: For binary Y 1,..., Y n an autologistic (binary MRF) model specifies the full conditional distributions as p i = P (Y i = 1 y j, j i) = P (Y i = 1 y j, j i ) and p i log( ) = x i 1 p β + ψ 1 j w ij y j where β is a vector of regression parameters and ψ R is a spatial dependence parameter. These full conditional distributions are compatible and Brook s lemma yields the form of the joint pmf: f(y 1,..., y n ) exp{β ( i y i x i )+ψ i,j w ij y i y j } A Gibbs distribution with potentials on cliques of order 2.

52 We can, in principle use this form to fit the model and obtain, for example, MLE s of β and ψ. Unfortunately, there is a computational problem that arises. The normalizing constant in f(y 1,..., y n ) depends on model parameters f(y 1,..., y n ) = C(β, ψ) exp{β ( i y i x i ) + ψ i,j w ij y i y j } and so would need to be evaluated at each iteration of the maximization procedure. Note that C(β, ψ) 1 = 1 y 1 =0 1 exp{β ( y n =0 i y i x i )+ψ i,j w ij y i y j }

53 Evaluating this constant for any particular value of β and ψ requires summing 2 n terms not feasible even for moderate n; in particular since we would have to do this iteratively. Evaluating the normalizing constant is also required for Bayesian inference. Pseudo likelihood, a somewhat adhoc inferential scheme can be employed to avoid the calculation of the normalization constant. The autologistic model can be generalized to the case where each Y i is categorical and takes values in the set {0, L 1} for some L 2.

54 In this case the full conditional distributions are defined by P (Y i = l y j, j i) exp(ψ j i w ij I(y j = l)) where ψ R is again a spatial dependence parameter. Covariates can be added to this model just as in the autologistic case. This model, referred to as the Potts model can be used to model allocations in finite mixture models providing a robust alternative to the usual Gaussian spatial random effects models As before, the model contains a normalizing constant C(ψ) that causes computational problems when fitting this model.

55 Simultaneous autoregressive (SAR) models MRF models such as the CAR models we have discussed are by far the most popular sorts of models for areal data. An alternative class of models for areal data can be based on an autoregressive structure similar to that adopted in time series modeling. As before we have data Y 1,..., Y n and spatial information W. Unlike the MRF approach, we do not focus on full conditionals in this framework.

56 Instead, we start with a vector of independent errors or innovations e MV N(0, D) with D = diag{σ1 2,..., σ2 n } or more simply D = σ 2 I. We then construct a simple functional relationship between Y and e and this relationship induces a distribution for Y. Consider the relationship Y i = j b ij Y j + e i, i = 1,..., n for some constant b ij and with b ii = 0.

57 In matrix form this is where B = (b ij ). Y = BY + e From this we can obtain the relationship between Y and e Y = (I B) 1 e assuming I B is invertible. The simple distribution assigned to e then induces the following for Y: Y MV N(0, (I B) 1 D[(I B) 1 ] ) and when D = σ 2 I this is just Y MV N(0, σ 2 (I B) 1 [(I B) 1 ] )

58 To ensure that I B is invertible, we can take B = ρw and restrict ρ to an appropriate range. Invertibility is ensured when ρ (1/λ (1), 1/λ (n) ) where λ (1) and λ (n) are the smallest and largest eigenvalues of W. The SAR model is then based on Σy = σ 2 [(I ρw)(i ρw) ] 1 where ρ is referred to as the autoregression parameter with ρ = 0 corresponding to Σy = σ 2 I an independence model.

59 Regression: When covariates are present, the SAR model can be adopted as a model for residuals. In this case we define U = Y Xβ and assume U follows a SAR model so that (I ρw)u = e (I ρw)(y Xβ) = e Y = ρwy + (I ρw)xβ + e Note here that if W = 0 this is the standard linear model. Note that the spatial covariance structure implied by the SAR model, just as with the CAR model, is not entirely intuitive.

60 In addition, the SAR models unlike the CAR models, are not based on a set of full conditional distributions. These of course exist, but they do not have a computationally convenient form. As a result, SAR models are not well suited to model fitting using the Gibbs sampler. Finally, Cressie (1993) shows that any SAR model can be represented as a CAR model; however, the converse is not true. There exist CAR models that do not have a representation as a SAR model. Given the above, we will not consider SAR models further in this course.

61 I note; however, the general approach of building spatial distributions using transformations of independent RV s is a simple, intuitive and appealing approach. Other similar approaches could (and should) be explored further...

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015. Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

More information

3. Regression & Exponential Smoothing

3. Regression & Exponential Smoothing 3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour. Patrick Lam Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

Dongfeng Li. Autumn 2010

Dongfeng Li. Autumn 2010 Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

MATH2740: Environmental Statistics

MATH2740: Environmental Statistics MATH2740: Environmental Statistics Lecture 6: Distance Methods I February 10, 2016 Table of contents 1 Introduction Problem with quadrat data Distance methods 2 Point-object distances Poisson process case

More information

Monte Carlo Simulation

Monte Carlo Simulation 1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: sweber@stochastik.uni-hannover.de web: www.stochastik.uni-hannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging

More information

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims The zero-adjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:

More information

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

1 Portfolio mean and variance

1 Portfolio mean and variance Copyright c 2005 by Karl Sigman Portfolio mean and variance Here we study the performance of a one-period investment X 0 > 0 (dollars) shared among several different assets. Our criterion for measuring

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

Optimizing Prediction with Hierarchical Models: Bayesian Clustering

Optimizing Prediction with Hierarchical Models: Bayesian Clustering 1 Technical Report 06/93, (August 30, 1993). Presidencia de la Generalidad. Caballeros 9, 46001 - Valencia, Spain. Tel. (34)(6) 386.6138, Fax (34)(6) 386.3626, e-mail: bernardo@mac.uv.es Optimizing Prediction

More information

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation Average Redistributional Effects IFAI/IZA Conference on Labor Market Policy Evaluation Geert Ridder, Department of Economics, University of Southern California. October 10, 2006 1 Motivation Most papers

More information

Credit Risk Models: An Overview

Credit Risk Models: An Overview Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

An Internal Model for Operational Risk Computation

An Internal Model for Operational Risk Computation An Internal Model for Operational Risk Computation Seminarios de Matemática Financiera Instituto MEFF-RiskLab, Madrid http://www.risklab-madrid.uam.es/ Nicolas Baud, Antoine Frachot & Thierry Roncalli

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Markov random fields and Gibbs measures

Markov random fields and Gibbs measures Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Extracting correlation structure from large random matrices

Extracting correlation structure from large random matrices Extracting correlation structure from large random matrices Alfred Hero University of Michigan - Ann Arbor Feb. 17, 2012 1 / 46 1 Background 2 Graphical models 3 Screening for hubs in graphical model 4

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Centre for Central Banking Studies

Centre for Central Banking Studies Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

More information

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni 1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

More information

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of

More information

Lecture 14: GLM Estimation and Logistic Regression

Lecture 14: GLM Estimation and Logistic Regression Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software

More information

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing

More information

Probability Calculator

Probability Calculator Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Monte Carlo-based statistical methods (MASM11/FMS091)

Monte Carlo-based statistical methods (MASM11/FMS091) Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlo-based

More information

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil. Steven J Zeil Old Dominion Univ. Fall 200 Discriminant-Based Classification Linearly Separable Systems Pairwise Separation 2 Posteriors 3 Logistic Discrimination 2 Discriminant-Based Classification Likelihood-based:

More information

Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

More information

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

More information

11. Time series and dynamic linear models

11. Time series and dynamic linear models 11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Imputing Missing Data using SAS

Imputing Missing Data using SAS ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

Gaussian Conjugate Prior Cheat Sheet

Gaussian Conjugate Prior Cheat Sheet Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian

More information

The Variability of P-Values. Summary

The Variability of P-Values. Summary The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report

More information

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i ) Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (1903-1987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

More information

Moving Least Squares Approximation

Moving Least Squares Approximation Chapter 7 Moving Least Squares Approimation An alternative to radial basis function interpolation and approimation is the so-called moving least squares method. As we will see below, in this method the

More information

Nonparametric Statistics

Nonparametric Statistics Nonparametric Statistics References Some good references for the topics in this course are 1. Higgins, James (2004), Introduction to Nonparametric Statistics 2. Hollander and Wolfe, (1999), Nonparametric

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

How to assess the risk of a large portfolio? How to estimate a large covariance matrix?

How to assess the risk of a large portfolio? How to estimate a large covariance matrix? Chapter 3 Sparse Portfolio Allocation This chapter touches some practical aspects of portfolio allocation and risk assessment from a large pool of financial assets (e.g. stocks) How to assess the risk

More information

Stochastic Inventory Control

Stochastic Inventory Control Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

More information

Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

More information

Anomaly detection for Big Data, networks and cyber-security

Anomaly detection for Big Data, networks and cyber-security Anomaly detection for Big Data, networks and cyber-security Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with Nick Heard (Imperial College London),

More information

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

More information

SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

More information

Solving Linear Systems, Continued and The Inverse of a Matrix

Solving Linear Systems, Continued and The Inverse of a Matrix , Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Goodness of fit assessment of item response theory models

Goodness of fit assessment of item response theory models Goodness of fit assessment of item response theory models Alberto Maydeu Olivares University of Barcelona Madrid November 1, 014 Outline Introduction Overall goodness of fit testing Two examples Assessing

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

STAT 830 Convergence in Distribution

STAT 830 Convergence in Distribution STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31

More information