Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Save this PDF as:

Size: px
Start display at page:

Download "Spatial Statistics Chapter 3 Basics of areal data and areal data modeling"

Transcription

1 Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data Y 1,..., Y n observed on a set of geographical units (over a map), the pixels of an image or a regular arrangements of points on a lattice. 1

2 Models for areal data are also sometimes employed for irregularly arranged point-referenced data sets when the number of spatial units is very large computational considerations.

3 As we shall see in Chapter 5, certain types of areal models are computationally easier to work with and ideal for use with Gibbs sampler. In this setting, unlike the geostatistical one, we are typically not interested in prediction and have observed data at all spatial sites. What is of interest in this setting? Spatial pattern evident? Are there clusters of high/low values?

4 Smoothing: Filter out some of the noise in the data help elucidate spatial pattern. Deciding how much to smooth the data is not always clear. Smoother maps are easier to interpret but will generally not represent the data well and vice versa. Example: No smoothing at all is equivalent to presenting a raw map of the data. Extreme smoothing would involve associating the same value Ȳ with all units. Optimal smoothing lies somewhere between these two extremes.

5 Also of interest in this setting is relating the response to covariates through regression models need to account for spatial dependence in such regression models. Also in the regression setting, we would be interested in examining the residual spatial structure after accounting for covariates. Exploratory methods for areal data Recall the primary source of spatial information in the areal setting consists of adjacencies knowing, for each region, all the neighboring regions (for some appropriate definition of neighbor). i.e.the arrangement of the regions across the map.

6 This adjacency structure is quantified through the neighborhood (or proximity) matrix W: W ij = 0 if i = j 0 if i and j are not neighbors c ij > 0 if i and j are neighbors c ij quantifies the strength of the neighbor relationship. Most often c ij = 1 for all neighbor pairs and two regions are considered neighbors if they share a common boundary. It is instructive to think of this spatial structure as a graph, where nodes correspond to regions and two nodes on the graph are connected if the associated regions are neighbors.

7 The neighborhood matrix W can be used for exploratory analysis and will also be used when we discuss models for areal data. Note that it is also possible to define 2 nd order neighbors and to have a corresponding 2 nd order neighborhood matrix. After simply plotting data (usually on a map in this case) an exploratory analysis usually proceeds with an attempt to quantify the strength of spatial association in the data.

8 For this, two statistics can be employed: 1. Morn s I: I = n i j w ij (Y i Ȳ )(Y j Ȳ ) ( i j w ij ) i(y i Ȳ ) 2 where I 0 no spatial dependence I > 0 positive spatial dependence I < 0 negative spatial dependence Can be thought of as an areal correlation coefficient.

9 2. Geary s C: C = (n 1) i j w ij (Y i Y j ) 2 ( i j w ij ) i(y i Ȳ ) 2 where C 0 C 1 no spatial dependence C < 1 positive spatial dependence C > 1 negative spatial dependence Under the hypothesis that the Y i s are iid, one can show that the asymptotic distributions of both statistics are normal and that E[I] = 0; E[C] = 1

10 Using these asymptotic distributions one can easily construct hypothesis test of H 0 : E[I] = 0 against either a one or two-sided alternative. Another, perhaps preferable, way to test for association is to use a Monte Carlo test for independence. Idea: Under the assumption that the Y i s are iid, the distribution of I (and C) is invariant to permutations of the Y i s. What does this mean?

11 The distribution of I clearly depends on W; however, if the spatial structure has no role to play then permuting the rows of W will not change the distribution of I. So [I W] [I W ] where W is any row permutation of W. To calculate a Monte Carlo test for spatial association, we randomly permute the data vector Y (equivalent to permuting the rows of W) and calculate the value the new value say, I (1). Repeat this procedure many times, say, n = 999: I (1), I (2),..., I (999) and plot the histogram of these values. We then locate the original observed value I (obs) on this histogram.

12 Under the assumption that the Y i s are iid, the observed value I (obs) comes from the same distribution as I (1), I (2),..., I (999) I (obs) should lie somewhere in the main body of the histogram. If I (obs) lies in the tails of the histogram, we have evidence against the hypothesis that the Y i s are iid. Can quantify this by calculating an empirical p-value. If associated with each Y i is a vector of covariates x i, then even if the Y i s are spatially dependent they may not be identically distributed.

13 As in the point referenced setting, this suggests applying these techniques to the estimated residuals from standard regression models. Simple Smoothing To filter out noise in the data and produce a smooth map we can use the W matrix and replace each Y i with Ŷ i = w ij Y j ; w i+ = w ij j w i+ j a weighted average that will encourage the smoothed Y i to be similar to its neighbors. Problems with this? A possible remedy is Yˆ i for α [0, 1]. = (1 α)y i + αŷ i

14 Here, α = 0 yields the raw data and α = 1 yields a very smooth map. Try different values of α in an exploratory fashion. In Chapter 5 we will discuss hierarchical models for smoothing which will incorporate covariate information and spatial random effects. In that setting our smoothed Y i s will be posterior means E[Y i Data].

15 Markov Random Fields In the point-referenced data setting we specified the joint distribution of the observed data Y 1,..., Y n directly. In the areal setting, where we have Y 1,..., Y n and a neighborhood matrix W we will take a different approach and build the required joint distributions f(y 1,..., y n ) through the specification of a set of simpler full conditional distributions f(y i y j, j i), i = 1,..., n. For a given joint distribution f(y 1,..., y n ) we can always obtain unique and well defined conditional distributions f(y 1,..., y n ) = f(y 1,..., y n ) f(y1,..., y n )dy j

16 But note that the converse is not always true! We can not simply write down a set of full conditional distributions f(y i y j, j i), i = 1,..., n and claim that these determine a unique f(y 1,..., y n ). Consider two random variables with Y 1 Y 2 N(α 0 + α 1 Y 2, σ1 2 ) and Y 2 Y 1 N(β 0 + β 1 Y1 3, σ2 2 )

17 In this case E[Y 1 ] = E[E[Y 1 Y 2 ]] = E[α 0 + α 1 Y 2 ] = α 0 + α 1 E[Y 2 ] E[Y 2 ] is a linear function of E[Y 1 ] But we also have E[Y 2 ] = E[E[Y 2 Y 1 ]] = E[β 0 + β 1 Y 3 1 ] β 0 + β 1 E[Y 3 1 ] Both conditions can not hold (except in trivial cases) and so here the two conditional distributions do not determine a valid and unique joint distribution.

18 In general when a set of full conditional distributions determine a unique and valid joint distribution we say that the set of conditional distributions is compatible. Improper distribution: An improper distribution is a distribution with non-integrable density. That is, if S is the sample space of Y then S f(y)dy = When would such an object be useful in statistics? Clearly, an improper distribution is not useful as a model for data. In Bayesian statistics, where parameters are assigned probability distributions, improper distributions may be employed as priors. How?

19 Even though the prior density π(θ) is such that π(θ)dθ = having observed data y (assumed to arise from a proper distribution) the corresponding posterior may be proper π(θ y)dθ < and so inference based on this posterior is valid. Such distributions have their uses in Bayesian statistics and in fact are used, as we shall see later, as models for random effects in an areal data setting.

20 Given a set of compatible and proper full conditional distributions f(y i y j, j i), i = 1,..., n, the resulting joint distribution can be improper! Example: consider the bivariate joint distribution with f(y 1, y 2 ) exp[ 1 2 (y 1 y 2 ) 2 ], (y 1, y 2 ) R 2 This density has no valid normalizing constant since exp[ 1 2 (y 1 y 2 ) 2 ]dy 1 dy 2 = and so the distribution is improper. What about the corresponding full conditional distributions?

21 Clearly [Y 1 Y 2 = y 2 ] N(y 2, 1) and [Y 2 Y 1 = y 1 ] N(y 1, 1) so here we have an example of two compatible and proper full conditional distributions that yield an improper joint distribution. If we have a set of compatible full conditional distributions f(y i y j, j i), i = 1,..., n, how can we determine the form of the resulting joint distribution f(y 1,..., y n )? Brook s Lemma

22 Brook s Lemma notes that if {f(y i y j ), j i), i = 1,..., n} is a set of compatible full conditional distributions and y 0 = (y 10,..., y no ) is any fixed point in the support of f(y 1,..., y n ) then f(y 1,..., y n ) = f(y 1 y 2,..., y n ) f(y 10 y 2,..., y n ) f(y 2 y 10, y 3..., y n ) f(y 20 y 10, y 3,..., y n ) f(y n y 10,..., y n 1,0 ) f(y n0 y 10,..., y n 1,0 ) f(y 10,..., y n0 ) This gives us the joint distribution up to a normalizing constant. If f(y 1,..., y n ) is proper, then the fact that it integrates to 1 determines the normalizing constant. How should we specify the full conditional distributions so that (1) they are compatible and (2) they are simple enough and yet yield useful spatial structure?

23 We will not worry about (1). To address (2) we will assume that the full conditional distribution of Y i depends only on its neighbors. That is, the full conditional distribution of Y i will depend only on those Y j s that have W ij 0. Letting i = {j W ij 0} denote the set of neighbors for region i (i j W ij 0) this implies f(y i y j, j i) = f(y i y j, j i ), i = 1,..., n

24 This sort of specification for the full conditional distributions, when compatible, is referred to as a Markov random field (MRF) due to the obvious Markovian structure of the full conditional distributions. The idea behind such models is the development of a complicated spatial dependence structure through a set of simple local specifications that depend only on lattice (or map) adjacencies. We will develop and employ these sorts of models as models for areal data or as models for random effects in an areal setting. Clique: A clique is a set of cells (or indices) such that each element in the set is a neighbor of every other element in the set.

25 Think of the graph representation of the neighborhood structure mentioned earlier. A clique represents a set of nodes M on the graph such the each pair of indices (i, j) with both i and j in M represents an edge of the graph. With n spatial units, we can have cliques of size 1,..., n. Potential function: A potential of order k is a function of k arguments that is exchangeable in its arguments. A potential function of order k typically operates on the variable values y s1,..., y sk associated with a clique {s 1,..., s k } of size k.

26 Examples k = 2 1. y i y j 2. (y i y j ) 2 3. y i y j + (1 y i )(1 y j ) for binary data Gibbs Distribution: A joint distribution for Y 1,..., Y n is a Gibbs distribution if the joint density/pmf f(y 1,..., y n ) takes the following form f(y 1,..., y n ) exp{γ k α M k φ (k) (y α1,..., y αk )} Where φ (k) ( ) is a potential of order k, M k is the collection of all cliques of size k and γ > 0 is a parameter.

27 The joint distribution f(y 1,..., y n ) depends on y 1,..., y n only through potential functions evaluated over the cliques induced by the neighborhood (graph) structure. Note such a distribution may have more than one parameter the potential functions may depend on unknown parameters.

28 Hammersley-Clifford Theorem: If we have a MRF then the corresponding joint distribution is a Gibbs distribution. Only Cliques of order 1 independence - consider the form of the corresponding Gibbs distribution. Distributions having Cliques of order 2 are most common. An example is the pairwise difference form f(y 1,..., y n ) exp{ 1 2τ 2 (y i y j ) 2 } based on quadratic potential functions. i,j

29 Conditionally autoregressive (CAR) models Particularly popular class of MRF models introduced by J. Besag in These models have become very popular within the last decade, particulary since the advent of Gibbs sampling. Gibbs sampling is a procedure for simulating realizations from a joint distribution f(y 1,..., y n ) using only the full conditional distributions {f(y i y j, j i), i = 1,..., n}.

30 Useful in Bayesian statistics when we want to draw samples from a posterior distribution of interest. MRF models are ideal in this setting since they are specified in terms of full conditional distributions. More on this later...

31 Autonormal (Gaussian) CAR models Here we begin with the full conditionals [Y i y j, j i] N( j b ij y j, τ 2 i ), i = 1,..., n For appropriately chosen b ij these full conditionals are compatible, so using Brook s lemma we can obtain the joint distribution as f(y 1,..., y n ) exp{ 1 2 y D 1 (I B)y} where B = (b ij ) and D = diag{τ 2 1,..., τ 2 n } Looks like a multivariate normal distribution with µ = 0 and Σ 1 y = D 1 (I B).

32 This is of course only true if D 1 (I B) is symmetric. We must choose b ij in the conditional Gaussian distributions to ensure this symmetry. In particular, choosing b ij so that b ij τ 2 i = b ji τj 2, for all i, j will ensure symmetry (and compatibility). Notice that if τ 2 i τ 2 j then we can not have b ij = b ji. How to choose the b ij s subject to the above constraints? and also, to yield a reasonable joint spatial distribution?

33 We will take the b ij s to be functions of the neighborhood matrix W b ij = w ij w i+, τ 2 i = τ 2 w i+ Does this specification satisfy the symmetry condition? With these choices the full conditional distributions are [Y i y j, j i] N( j w ij w i+ y j, τ 2 w i+ ), i = 1,..., n Interpretation?

34 The joint distribution for these choices of b ij and τ i is f(y 1,..., y n ) exp{ 1 2τ 2y (D W W)y} where D W = diag{w 1+,..., w n+ }. This is again MVN with µ = 0 and Σ 1 y = (D W W) Note here that (D W W)1 = 0 Σ 1 y is singular! This is a singular MVN distribution an improper distribution no valid normalizing constant

35 Such a distribution is often referred to as a Gaussian intrinsic autoregression. To further investigate this impropriety we can rewrite the joint distribution as f(y 1,..., y n ) exp{ 1 2τ 2 i,j w ij (y i y j ) 2 } a pairwise difference Gibbs distribution with quadratic potentials. What happens to this distribution if I add a constant µ to all the Y i? nothing the Y i s are not centered. This distribution does not identify an overall mean.

36 To provide the required centering we can impose a constraint Yi = 0 Problems with this as a model for data? Can not expect our data to respect this constraint... This constrained improper distribution can not be used as a model for data, but can be used as a model for spatial random effects (a prior for parameters that vary spatially). Perhaps explain this in the context of a map...

37 If we want to use the autonormal model as a distribution for data (as opposed to a prior for spatial random effects) we need an alternative solutions to the impropriety problem. We have (D W W)1 = 0 causing unfortunate results. An obvious remedy is to incorporate a constant ρ so that Σ 1 y is non-singular. = (D W ρw) Such models are often referred to as proper CAR models.

38 How to choose ρ to ensure non-singularity? Such non-singularity is guaranteed provided ρ ( λ 1, 1 (1) λ ) where λ (n) (1) < λ (2) < λ (n) are the ordered eigenvalues of D 1 2 w WD 1 2 w. It is also possible to show λ (1) < 0 and λ (n) > 0 so that the interval ( λ 1, 1 (1) λ ) contains (n) 0. How to choose ρ?

39 Leave ρ ( λ 1, 1 (1) λ ) unspecified as a parameter in our (n) model. One usually adopts the simple choice ρ [0, 1) when λ (n) = 1. Here ρ = 0 corresponds to conditional distributions [Y i y j, j i] N(0, spatial independence. τ 2 w i+ ), i = 1,..., n Further ρ 1 corresponds to the IAR model and larger values of ρ imply a greater degree of spatial dependence.

40 Note with the IAR model (ρ = 1) we only have one parameter τ 2 - the variance component. This variance component does not quantify spatial dependence in any way. With the IAR model, much of the spatial structure imposed by the model is preimplied by the chosen W. Note also that independence does not arise as a special case of this model.

41 Of course one could, in principle, allow the neighborhood structure, W, itself to be a parameter in the model fairly complicated. When the more general CAR model incorporating ρ is employed, how does one interpret ρ? very carefully. In particular, ρ does not represent correlation. Rather, ρ is some measure of dependence in the sense that ρ = 0 corresponds to independence and spatial dependence increases with ρ. The maximum allowable spatial dependence corresponds to the IAR model when ρ = 1.

42 To calibrate ρ for a given neighborhood structure and map, one could simulate realizations from the CAR model for different values of ρ. For each realization we could compute Moran s I to get a strength of the spatial dependence implied by a particular ρ value.

43 In general, even moderate amounts of spatial dependence will require ρ > 0.9 and usually estimates of ρ are close to its upper bound value. When modeling random effects in an areal data setting, I usually fit models based on the proper CAR model as well as the IAR model and then compare the two using some model selection tool. Usually, at least in my experience, the IAR model ends up being the preferred model.

44 I note again that in the framework of this model we specify a joint normal distribution for the data and specify the inverse covariance matrix Σ 1 y = (D W ρw) but in general have no simple form for the covariance matrix. The elements of Σ y give us, of course, information on the marginal covariance structure of Y. The elements of Σ 1 y give us information on the conditional covariance structure of Y. For example, using standard results associated with the MVN distribution, we can show that 1/(Σ 1 y ) ii gives us V AR(Y i y j, j i).

45 Moreover, if (Σ 1 y ) ij = 0 then Y i and Y j are conditionally independent given {y k, k i, j}. We see that W ij = 0 implies conditional independence between Y i and Y j (given all other Y s). From this we see that the specification of a neighborhood structure W is essentially a set of conditional independence assumptions. Regression: If the proper CAR model is used as a distribution for data, we can accommodate covariates x i by modifying the conditional distributions to N(x i β + j w ij w i+ (y j x j β), τ 2 w i+ ), i = 1,..., n

46 With these conditional specifications the marginal distribution for Y is MVN with µ = Xβ and Σ 1 y = (D W ρw). We will mostly be concerned with the µ = 0 case when CAR models are applied as a (prior) distribution for random effects. Multivariate spatial data: Suppose, associated with each areal unit, we observe several, say p dependent observations Y i = (Y i1, Y i2..., Y ip ). Models for these sorts of data must account for the spatial dependence across areal units and also dependence within each Y i.

47 Multivariate conditional autoregressive models (MCAR) have been developed for such data. The idea is a straightforward extension of the univariate case where we specify the joint distribution of all np random variables Y = (Y 1,..., Y n ) through a set of full conditional distributions. These full conditional distributions will be p variate normal instead of univariate normal. Note also that a CAR model can, in principle, be adopted for model point referenced data by allowing the elements of W to depend on the distance between points.

48 This may be useful for very large datasets since CAR models, as we shall see in Chapter 5, are numerically less demanding to fit within a Gibbs sampling framework. When prediction is not of interest, this is a perfectly acceptable way of building a joint distribution. Whether or not such an approach yields an adequate representation of the underlying spatial structure in a given application is a model assessment issue - and a critical one at that.

49 Non-Gaussian CAR models When dealing with non-gaussian areal data, our preferred approach will be based on generalized linear mixed models, where we incorporate Gaussian CAR random effects into models for non-gaussian data Chapter 5. An alternative to this approach, which we consider now, is to adopt a MRF type specification for the data Y 1,..., Y n and determine a joint distribution through the specification of a set of compatible non-gaussian full conditional distributions.

50 For example, we can allow the full conditional distributions f(y i y j, j i) to take Poisson, binomial, Gamma or in fact any form from the exponential family. When these are compatible, the result is a joint spatial distribution for non-gaussian data. See Cressie (1993) for a full development of CAR models in a general framework. I will present two examples of such non- Gaussian CAR models and discuss the computational problems associated with these.

51 Binary Data: For binary Y 1,..., Y n an autologistic (binary MRF) model specifies the full conditional distributions as p i = P (Y i = 1 y j, j i) = P (Y i = 1 y j, j i ) and p i log( ) = x i 1 p β + ψ 1 j w ij y j where β is a vector of regression parameters and ψ R is a spatial dependence parameter. These full conditional distributions are compatible and Brook s lemma yields the form of the joint pmf: f(y 1,..., y n ) exp{β ( i y i x i )+ψ i,j w ij y i y j } A Gibbs distribution with potentials on cliques of order 2.

52 We can, in principle use this form to fit the model and obtain, for example, MLE s of β and ψ. Unfortunately, there is a computational problem that arises. The normalizing constant in f(y 1,..., y n ) depends on model parameters f(y 1,..., y n ) = C(β, ψ) exp{β ( i y i x i ) + ψ i,j w ij y i y j } and so would need to be evaluated at each iteration of the maximization procedure. Note that C(β, ψ) 1 = 1 y 1 =0 1 exp{β ( y n =0 i y i x i )+ψ i,j w ij y i y j }

53 Evaluating this constant for any particular value of β and ψ requires summing 2 n terms not feasible even for moderate n; in particular since we would have to do this iteratively. Evaluating the normalizing constant is also required for Bayesian inference. Pseudo likelihood, a somewhat adhoc inferential scheme can be employed to avoid the calculation of the normalization constant. The autologistic model can be generalized to the case where each Y i is categorical and takes values in the set {0, L 1} for some L 2.

54 In this case the full conditional distributions are defined by P (Y i = l y j, j i) exp(ψ j i w ij I(y j = l)) where ψ R is again a spatial dependence parameter. Covariates can be added to this model just as in the autologistic case. This model, referred to as the Potts model can be used to model allocations in finite mixture models providing a robust alternative to the usual Gaussian spatial random effects models As before, the model contains a normalizing constant C(ψ) that causes computational problems when fitting this model.

55 Simultaneous autoregressive (SAR) models MRF models such as the CAR models we have discussed are by far the most popular sorts of models for areal data. An alternative class of models for areal data can be based on an autoregressive structure similar to that adopted in time series modeling. As before we have data Y 1,..., Y n and spatial information W. Unlike the MRF approach, we do not focus on full conditionals in this framework.

56 Instead, we start with a vector of independent errors or innovations e MV N(0, D) with D = diag{σ1 2,..., σ2 n } or more simply D = σ 2 I. We then construct a simple functional relationship between Y and e and this relationship induces a distribution for Y. Consider the relationship Y i = j b ij Y j + e i, i = 1,..., n for some constant b ij and with b ii = 0.

57 In matrix form this is where B = (b ij ). Y = BY + e From this we can obtain the relationship between Y and e Y = (I B) 1 e assuming I B is invertible. The simple distribution assigned to e then induces the following for Y: Y MV N(0, (I B) 1 D[(I B) 1 ] ) and when D = σ 2 I this is just Y MV N(0, σ 2 (I B) 1 [(I B) 1 ] )

58 To ensure that I B is invertible, we can take B = ρw and restrict ρ to an appropriate range. Invertibility is ensured when ρ (1/λ (1), 1/λ (n) ) where λ (1) and λ (n) are the smallest and largest eigenvalues of W. The SAR model is then based on Σy = σ 2 [(I ρw)(i ρw) ] 1 where ρ is referred to as the autoregression parameter with ρ = 0 corresponding to Σy = σ 2 I an independence model.

59 Regression: When covariates are present, the SAR model can be adopted as a model for residuals. In this case we define U = Y Xβ and assume U follows a SAR model so that (I ρw)u = e (I ρw)(y Xβ) = e Y = ρwy + (I ρw)xβ + e Note here that if W = 0 this is the standard linear model. Note that the spatial covariance structure implied by the SAR model, just as with the CAR model, is not entirely intuitive.

60 In addition, the SAR models unlike the CAR models, are not based on a set of full conditional distributions. These of course exist, but they do not have a computationally convenient form. As a result, SAR models are not well suited to model fitting using the Gibbs sampler. Finally, Cressie (1993) shows that any SAR model can be represented as a CAR model; however, the converse is not true. There exist CAR models that do not have a representation as a SAR model. Given the above, we will not consider SAR models further in this course.

61 I note; however, the general approach of building spatial distributions using transformations of independent RV s is a simple, intuitive and appealing approach. Other similar approaches could (and should) be explored further...

Advanced Spatial Statistics Fall 2012 NCSU. Fuentes Lecture notes

Advanced Spatial Statistics Fall 2012 NCSU Fuentes Lecture notes Areal unit data 2 Areal Modelling Areal unit data Key Issues Is there spatial pattern? Spatial pattern implies that observations from units

STATISTICS for LATTICE DATA

STATISTICS for LATTICE DATA 1. Exploratory Data Analysis. 2. Measures of Spatial Autocorrelation. Exploratory Data Analysis Lattice data: nontrivial observations are taken at a finite number of sites whose

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

The Exponential Family

The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

Geostatistics Exploratory Analysis

Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline

Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression

15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

Markov chains and Markov Random Fields (MRFs)

Markov chains and Markov Random Fields (MRFs) 1 Why Markov Models We discuss Markov models now. This is the simplest statistical model in which we don t assume that all variables are independent; we assume

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

Exploratory Data Analysis

Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

ABC methods for model choice in Gibbs random fields

ABC methods for model choice in Gibbs random fields Jean-Michel Marin Institut de Mathématiques et Modélisation Université Montpellier 2 Joint work with Aude Grelaud, Christian Robert, François Rodolphe,

P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

Probabilistic Graphical Models

Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 4, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 4, 2011 1 / 22 Bayesian Networks and independences

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

4. Joint Distributions of Two Random Variables

4. Joint Distributions of Two Random Variables 4.1 Joint Distributions of Two Discrete Random Variables Suppose the discrete random variables X and Y have supports S X and S Y, respectively. The joint

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

Maximum Likelihood Estimation

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

3 Random vectors and multivariate normal distribution

3 Random vectors and multivariate normal distribution As we saw in Chapter 1, a natural way to think about repeated measurement data is as a series of random vectors, one vector corresponding to each unit.

EC 6310: Advanced Econometric Theory

EC 6310: Advanced Econometric Theory July 2008 Slides for Lecture on Bayesian Computation in the Nonlinear Regression Model Gary Koop, University of Strathclyde 1 Summary Readings: Chapter 5 of textbook.

CS395T Computational Statistics with Application to Bioinformatics

CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2010 The University of Texas at Austin Unit 6: Multivariate Normal Distributions and Chi Square The

3. Regression & Exponential Smoothing

3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

1 Portfolio mean and variance

Copyright c 2005 by Karl Sigman Portfolio mean and variance Here we study the performance of a one-period investment X 0 > 0 (dollars) shared among several different assets. Our criterion for measuring

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

Credit Risk Models: An Overview

Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:

Dongfeng Li. Autumn 2010

Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation

Average Redistributional Effects IFAI/IZA Conference on Labor Market Policy Evaluation Geert Ridder, Department of Economics, University of Southern California. October 10, 2006 1 Motivation Most papers

3. The Multivariate Normal Distribution

3. The Multivariate Normal Distribution 3.1 Introduction A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis While real data

Christfried Webers. Canberra February June 2015

c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

1. χ 2 minimization 2. Fits in case of of systematic errors

Data fitting Volker Blobel University of Hamburg March 2005 1. χ 2 minimization 2. Fits in case of of systematic errors Keys during display: enter = next page; = next page; = previous page; home = first

Lab 8: Introduction to WinBUGS

40.656 Lab 8 008 Lab 8: Introduction to WinBUGS Goals:. Introduce the concepts of Bayesian data analysis.. Learn the basic syntax of WinBUGS. 3. Learn the basics of using WinBUGS in a simple example. Next

Logistic Regression (1/24/13)

STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

Monte Carlo Simulation

1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: sweber@stochastik.uni-hannover.de web: www.stochastik.uni-hannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging

Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

MS&E 226: Small Data

MS&E 226: Small Data Lecture 16: Bayesian inference (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 35 Priors 2 / 35 Frequentist vs. Bayesian inference Frequentists treat the parameters as fixed (deterministic).

Centre for Central Banking Studies

Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

Generalized Linear Model Theory

Appendix B Generalized Linear Model Theory We describe the generalized linear model as formulated by Nelder and Wedderburn (1972), and discuss estimation of the parameters and tests of hypotheses. B.1

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing

Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance

Lecture 14: GLM Estimation and Logistic Regression

Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South

Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

MATH2740: Environmental Statistics

MATH2740: Environmental Statistics Lecture 6: Distance Methods I February 10, 2016 Table of contents 1 Introduction Problem with quadrat data Distance methods 2 Point-object distances Poisson process case

Extracting correlation structure from large random matrices

Extracting correlation structure from large random matrices Alfred Hero University of Michigan - Ann Arbor Feb. 17, 2012 1 / 46 1 Background 2 Graphical models 3 Screening for hubs in graphical model 4

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Chapter 7 Pearson s chi-square test 7. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

Normality Testing in Excel

Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

Lecture 3 : Hypothesis testing and model-fitting

Lecture 3 : Hypothesis testing and model-fitting These dark lectures energy puzzle Lecture 1 : basic descriptive statistics Lecture 2 : searching for correlations Lecture 3 : hypothesis testing and model-fitting

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

Markov random fields and Gibbs measures

Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

Economic Order Quantity and Economic Production Quantity Models for Inventory Management

Economic Order Quantity and Economic Production Quantity Models for Inventory Management Inventory control is concerned with minimizing the total cost of inventory. In the U.K. the term often used is stock

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

Stochastic Inventory Control

Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

The Variability of P-Values. Summary

The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report

How to assess the risk of a large portfolio? How to estimate a large covariance matrix?

Chapter 3 Sparse Portfolio Allocation This chapter touches some practical aspects of portfolio allocation and risk assessment from a large pool of financial assets (e.g. stocks) How to assess the risk

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

An Introduction to Machine Learning

An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses Michael R. Powers[ ] Temple University and Tsinghua University Thomas Y. Powers Yale University [June 2009] Abstract We propose a

An Internal Model for Operational Risk Computation

An Internal Model for Operational Risk Computation Seminarios de Matemática Financiera Instituto MEFF-RiskLab, Madrid http://www.risklab-madrid.uam.es/ Nicolas Baud, Antoine Frachot & Thierry Roncalli

Linear Programming I

Linear Programming I November 30, 2003 1 Introduction In the VCR/guns/nuclear bombs/napkins/star wars/professors/butter/mice problem, the benevolent dictator, Bigus Piguinus, of south Antarctica penguins