Approximate Likelihoods for Spatial Processes

Size: px
Start display at page:

Download "Approximate Likelihoods for Spatial Processes"

Transcription

1 Approximate Likelihoods for Spatial Processes Petruţa C. Caragea, Richard L. Smith Department of Statistics, University of North Carolina at Chapel Hill KEY WORDS: Maximum likelihood for spatial processes, dimensionality reduction, relative efficiency, time series 1 Introduction Many applications of spatial statistics involve evaluating a likelihood function over a sample with an increasing number of data locations. For example, Holland et. al. (00) analyzed the Clean Air Status and Trends Network (CASTNet) data set, which was developed by the U.S. Environmental Protection Agency (EPA) in conjunction with the National Oceanic and Atmospheric Administration (NOAA) in order to monitor air quality and supporting meteorological measurements across the United States. Established in 1987, CASTNet almost doubled the number of site locations from 38 in 1989 (35 in the Eastern part of the U.S.) to 70 sites across the U.S. in 001. Holland et. al. focused on establishing a spatial map for trends in two pollutants: SO and SO 4 ; that is, the paper estimated the spatial parameters associated with these trends. The authors developed an algorithm which involved maximum likelihood estimation. In fact, the underlying field was assumed to be Gaussian with a spatial covariance function in some given family. The evaluation of the likelihood function involved calculating the inverse and determinant of the covariance matrix. Although this analysis is computationally feasible for CASTNet, it would not be so for a much larger network. Experience shows that by the time the number of locations increases to be in the hundreds the impact of this high dimensionality on calculating the inverse and the determinant of the covariance matrix makes computing maximum likelihood estimates intractable. Moreover, data sets that encompass hundreds if not thousands of location sites are becoming more prevalent. For example, the Historical Climate Network (HCN) developed and maintained by NOAA now has several thousand location sites. In order to be able to take advantage of the benefits of maximum likelihood estimates in the setting of such high dimensionality, it is necessary to establish efficient approximations to the likelihood. We consider here several approximate likelihoods based on grouping the observations into clusters and building an estimating function by accounting for variability both between and within clusters. Theoretical results derived for an analogous time series problem allow us to compare the three approximation schemes. These results are based on the general idea that calculations for the variance of the alternative estimator can be performed using the information sandwich principle, after we have expressed the derivatives of the quasi-likelihood function as a quadratic sum of independent normal random variables. We conclude by illustrating the new method with simulations. This paper is made up of three parts. The first section consists of the general theoretical methodology used to calculate the asymptotic variances for the proposed estimators. The second part describes the general spatial model, and the practical details of the proposed approximation schemes. Since it is practically intractable to compute thoretically the performance of these general estimators, we analyze in detail the analogous one-dimensional problem. Thus the third part consists of a thorough theoretical analysis of the following three estimators. The first one is named Big Blocks. We start by considering the mean value for each block. The proposed estimation function in this case is simply the likelihood of the block averages. Maximizing this function leads to the proposed estimator. We call the second estimator Small Blocks. In this case the theoretical derivations are performed under the assumption that there is independence between blocks. We calculate the likelihood function for each block, which is readily available since the original covariance structure is known. The function to be maximized here is obtained by multiplying the individual block-likelihoods. The last estimator is based on a combination of the two aforementioned schemes. Naturally, we expect the Big Blocks estimator to exhibit some loss in efficiency due to the representation of the entire block through its mean only, while the assumption of independence between blocks in the second case will also lead to some efficiency reduction, although not as large as in the first instance. We construct the Hybrid estimation function in a few steps. First compute the block means and consider their underlying likelihood function. Assume in this case that given the block mean, the blocks are independent. Although this assumption cannot always be verified in practice, it is a reasonable working assumption. Therefore we construct the estimation function to be the product of the block means likelihood and the product of individual conditional block likelihoods. Clearly this is not an exact likelihood, due to the conditional independence assumption. As a measure of performance, we define the relative efficiency as the ratio between the asymptotic variances of the classical MLE and the one for the alternative estimator. To compute the asymptotic variance for the various estimators, we use the expansion method. As a check, we compare the theoretical values obtained through the expansion method with the relative efficiencies of these three estimators on simulated data sets. We conclude by presenting a possible theoretical extension of the time series problem to its spatial equivalent and describe a promising approach for the more general case. 385 The Expansion Method The main novelty in this work is the approximation to the likelihood which leads to a certain dimensionality reduction. Another non standard element is the calculation of the asymptotic variance of the alternative estimators. Throughout this paper we refer to this technique as the expansion method. Since it lays at the basis of our theoretical calculations, we outline here the main principles. The most serious complication in calculating the asymptotic variance of any of the proposed estimators comes from the fact that they are derived from quasi-likelihood func-

2 tions. Therefore the standard Fisher information approach does not lead to correct results. Liang and Zeger (1986) proposed a solution to this problem, a technique that is known as the information sandwich approach. Suppose we have a statistical model indexed by a finitedimensional parameter θ, and suppose an estimate θ n is constructed by minimizing a criterion function S n (θ). We assume the true parameter value is θ 0 and that θ n is a consistent estimator. We also assume that S n (θ) is twice continuously differentiable in θ, and that its underlying distribution is sufficiently smooth so that the function H(θ), defined below, is also continuous in a neighborhood of θ 0. Let f(θ) for any function f denote the vector of first-order partial derivatives of f with respect to the components of θ, and f the matrix of second-order partial derivatives. We assume: 1 (SA1) n S n (θ) p H(θ) as n uniformly on some neighborhood of θ 0, where H( ) is a matrix-valued function, continuous near θ 0, with H(θ 0 ) invertible, (SA) 1 n S n (θ 0 ) d N (0, V (θ 0 )) for some covariance matrix V (θ 0 ). All the above conditions being satisfied, we can apply the Slutsky lemma together with a Taylor expansion to conclude that: n( θn θ 0 ) d N (0, H(θ 0 ) 1 V (θ 0 )H(θ 0 ) 1 ). Therefore, we need to be able to compute the variance of the first derivative of the minimizing criterion, as well as the expected value of the second derivative. The way we solve this problem is to employ a Corollary to the Martingale Central Limit Theorem (MCLT), mainly an application to quadratic forms of independent normal random variables. Consider the sequence S n = a n,i,j ξ i ξ j, {i,j: i j} where {ξ i } are independent N[0, 1], coefficients {a n,i,j } are defined for each n. Define: m n = E{S n } = i v n = Var{S n } = i a n,i,i and a n,i,i + {i,j: i<j} a n,i,j. Theorem (see Billingsley, 1995, page 476): Suppose (A1) max i a n,i,i /v n 0 as n, ( ) (A) max k i: i<k a n,i,k /v n 0 as n. Then S n m n vn d N[0, 1]. To summarize, the phrase expansion technique comes from the fact that the functions of interest, { S n }, are typically quadratic forms defined on the spatial process, which in turn can be explained into quadratic forms of the independent normal random variables ξ i. This enables us to use the MCLT to derive their variance and hence apply the sandwich information principle to obtain the asymptotic distribution of the alternative estimator. Joint Statistical Meetings - Section on Statistics & the Environment 3 Spatial Setting 3.1 Spatial Models The basic object we consider is a stochastic process {Z(s), s D} where D is a subset of R d (d-dimensional Euclidean space), usually though not necessarily d =. For example, Z(s) may represent the daily quantity of SO measured at a specific location s. Let µ(s) = E[Z(s)], s D, denote the mean value at location s. We also assume that the variance of Z(s) exists for all s D. In this work we analyze Gaussian processes. Also, we usually assume second-order stationarity though many of the results hold under the weak intrinsic stationarity assumption. Since we assume that we are sampling from a Gaussian process, we can write down the exact likelihood function which we subsequently maximize numerically with respect to the unknown parameters. Without any major change in the methodology, we can also incorporate linear regression terms in the model, which becomes Z N (Xβ, Σ) (1) where Z an n-dimensional vector of observations, X an n q matrix of known regressors, β a q-vector of unknown regression parameters and Σ the covariance matrix of the observations. In many applications we may assume Σ = αv (θ) () where α is an unknown parameter vector and V (θ) is a vector of standardized covariances determined by the unknown vector θ. With Z defined by (1), the negative log likelihood is given by l(β, α, θ) = n log(π) + n log α + 1 log V (θ) + 1 α (Z Xβ)T V (θ) 1 (Z Xβ). (3) The traditional approach employed to solve this optimization problem is a two-stage process and it is based on the Cholesky decomposition of the covariance matrix. The first stage computes the least squares estimator for β, while the second stage uses this estimator to perform a numerical maximization with respect to the other parameters. Since the number of computations to calculate the inverse and the determinant of an n n covariance matrix is of the order n 3, we expect serious delays in getting the results for large data sets. With the growing interest in monitoring and analyzing the ozone and particulate matter over the U.S., scenarios in which data is collected at as many as sites a few times daily, computational problems become more and more acute. As the exact maximum likelihood function becomes intractable in such instances, we shall consider the three alternatives to approximating the estimating function mentioned before: Big Blocks, Small Blocks and Hybrid. This procedure is based on the idea of clustering the sampling sites in a given number of groups (say b) of approximately equal size (say k). For the Big Blocks estimator, we first compute the cluster means and then consider their likelihood as the optimization criterion. We expect that summarizing the entire cluster correlation in a single component, the cluster mean, to lead to a loss in efficiency in some cases, especially for large cluster sizes. 386

3 For the Small Blocks estimator, we compute the quasilikelihood function as the product of individual cluster likelihoods. We assume the cluster correlation structure is known, belonging to some parametric family. The underlying assumption here is that the clusters are independent, which will induce some efficiency loss (especially for small cluster sizes), although we expect it to be less serious than in the previous case. To give a general idea of the computational efficiency of the Hybrid estimator, we describe not only the algorithm we follow, but also the approximate number of calculations one needs to perform in order to obtain it. This estimation technique accounts for both within and between cluster correlation, so we expect it to be superior to both aforementioned methods. We proceed as follows: 1. Calculate the cluster means and evaluate their joint likelihood. To do so, we need to compute the inverse of the b b covariance matrix corresponding to the cluster means, each of which requires approximately k steps, followed by the Cholesky decomposition of a b b matrix, which requires O(b 3 ) steps. If we summarize, the number of evaluation steps required here is O(b k + b 3 ). If b = n /3, this is of order O(n ), compared with O(n 3 ) for the full likelihood calculations, and this is the best possible rate for an estimator of this form.. Conditionally on the cluster means, compute the individual cluster joint likelihood. This is an O(k 3 ) operation, which is repeated b times, hence we perform O(b k 3 ) evaluations. This is of the same or smaller order than the first step if b O(n 1/ ). 3. Finally, compute the quasi-likelihood function by multiplying all the above b + 1 likelihood components. This is the function that needs to be maximized in the estimation process. For any i and j, compute σ ij = Cov[Z i, Z j ] = 1 k l=1 l =1 σ (i 1)k+l σ (j 1)k+l and define Σ = (σij ) 1 i,j b. Next we need to calculate the conditional likelihood given its mean. Thus, we first consider the joint density of the first k 1 observations in each block and the corresponding group mean. Then, for all 1 i b, the vector (Z (i 1)k+1,..., Z ik 1, Zi )T is normally distributed with vector ( mean (µ ) i, µ i ) and covariance matrix Σi τ i where µ i is given by (5), τ i σ ii µ T i = {µ (i 1)k+1,..., µ ik 1 } σ jj = Cov[Z (i 1)k+j, Z (i 1)k+j ] Σ i = (σ jj ) 1 j, j k 1 and τ i = {τ (i 1)k+1,..., τ ik 1 } where τ (i 1)k+j = Cov[Z i, Z (i 1)k+j ] Standard multivariate normal distribution results give the conditional joint density of Z (i 1)k+1,..., Z ik 1 given Z i to be N (µ i + τ T i σ ii 1 (Z i µ i ), Σ i τ i σ ii 1 τ T i ) Denote by µ ci and Σ ci (θ) the conditional mean and conditonal covariance matrix given the block mean, for the i th block, respectively. Then according to equation (3), we obtain the conditional log likelihoods 3. Modified Algorithm Practical Issues This section illustrates some of the practical details involved in the strategy described above. The first step is to cluster the data into a number of blocks with approximately the same number of elements. We employ a classical clustering procedure based on the latitude and longitude of each sample location and denote by b the number of clusters and by k the cluster size. Most of the details that follow are specific to the Big Blocks and Hybrid estimators, but one should proceed in a similar fashion for the Small Blocks. The first step is to compute the block averages. Define Z as the vector of cluster means, i.e. Z = {Z1,..., Zb } where Zi denotes the mean of cluster i. We assume that the new process Z is Gaussian, with mean µ and covariance matrix Σ. Thus the negative log likelihood for the cluster means is of the form: l means (β, θ) = b log(π) + 1 log Σ (θ) + 1 (Z µ ) T Σ (θ) 1 (Z µ ). (4) Note that for any i such that 1 i b we can express µ i = E[Z i ] = 1 k j=1 and let µ be the vector mean, µ = {µ 1,..., µ b }. Next compute the covariance matrix of the cluster means process. l ci (β, θ) = k 1 log(π) + 1 log Σ (θ) + 1 (Z i µ ci ) T Σ 1 (θ)(z i µ ci ). (6) The last step of the algorithm is to multiply individual likelihoods (4) and (6), or, equivalently, to sum the b + 1 individual negative log likelihoods. Thus the estimating function has the form: [ l full = 1 m bk log(π) + log Σ (θ) + log Σ ci (θ) i=1 i=1 + (Z µ ) T Σ 1 (θ) (Z µ ) ] m + (Z i µ ci ) T Σ 1 (θ) (Z i µ ci ). (7) Following we describe a rough sketch of the Hybrid estimation scheme. MODIFIED ALGORITHM 1. Note first that we can assume, without any loss of generality, that Σ = α V (θ) and hence Σ = α V (θ). For the current value of θ, compute V = V (θ) and V ci = V ci (θ). Next, perform the Cholesky decomposition V = L L and V ci = L ci L for all i = 1, b.. Calculate L 1 and L 1 for all i = 1, b (which is straightforward to do, since they are all lower triangular matrices). 3. Calculate L and L ci which are simply the product X (i 1)k+j β (5) of the diagonal entries of L and L ci respectively, for all i = 1, b and thus V = L, V ci = L ci. 4. Compute Z = L 1 Z and X = L 1 X. Also compute Z ci = L 1 Z i and hence calculate the transformed conditional mean µ for all i = 1, b. 387

4 5. This step calculates the GLS estimator of β. The problem that arises here is that one would need to compute the inverse of the original covariance matrix, V, which is a prohibitive operation if n is too large. One possible reduction of this problem would be to consider instead the covariance matrix of the joint distribution of all clusters, under the working assumption that they are independent given the cluster means. Since an efficient way to invert this approximating matrix is not yet known, another current option is to consider another approximate estimator, say β = ˆβ mean + ˆβblockj b + 1 where ˆβ mean and ˆβ blockj are the least squares estimators for the means and block conditional processes, respectively. In some simple cases, one could analytically minimize the estimating function with respect to α by defining α(θ) = G (θ) + m i=1 G (θ), n where G denotes the corresponding sum of squares. 6. Define the profile negative log likelihood as g(θ) = n log(π) + n log G (θ) + m i=1 G (θ) n + 1 m log( V (θ) + log( V ci (θ) ) + n so that g is the function to be minimized. 7. Repeat each of the steps 1 6 for each θ for which g has to be evaluated. The minimum will eventually be achieved at a point ˆθ and this defines the MLE. i=1 Through routine algebra manipulations we obtain that {Xm} has the following covariance structure: ] σɛ if m=0 γm [ = [ φ k+1 φ kφ +k k (1 φ) (1 φ ) φ(1 φ k ) k (1 φ) (1 φ ) ] σ ɛ if m = 1 (φ k ) m 1 γ 1 if m. It is interesting to remark here that this covariance structure corresponds to an ARMA(1,1) process. First we calculate the likelihood function for the means time series, using the covariance structure derived in equation (8). We compute the variance of the estimator using the information sandwich technique. Denoting by V ave the covariance matrix of the means process, we have that the likelihood function is: L ave = Define 1 1 exp (π) b/ V ave 1/ { 1 } X T VaveX 1 V = φ V 1 ave = (v ij ) 1 i, j b (8). (9) and assume that σ ɛ is known (this assumption will have little bearing on the final result and it considerably simplifies the computations). Thus the first derivative of the negative log likelihood function, modulo fixed constants, is given by φ l(φ) = 1 1 k v ij i=1 j=1 l=1 m=1 φ V V x (i 1)k+l x (j 1)k+m 4 Time Series Setting It immediately becomes clear that obtaining a theoretical approximation of the asymptotic variance of the alternative estimators is rather tedious. To avoid some of the complications due to the generality of that approach, we consider a simpler case, that preserves the characteristics of the original problem. We perform the complete calculations for the first order autoregressive time series, AR(1): X n = φ X n 1 + ɛ n, where {ɛ n } are independent N[0, 1]. This case is particularly appealing since it allows us to rewrite the quasilikelihood function as a quadratic form of independent normal random variables. Apparently simple, carrying out the complete calculations turns out to be rather involved. In this section we present the general ideas and a series of important computational details for the time series problem. 4.1 Big Blocks Method For the first method divide the original time series in b blocks, consisting of k observations each. Compute the mean of each block and let us denote by {Xm} the means time series. In other words, denote by Xm the average of the observations in the m th block and let γ m 1 = Cov[X 1, X m]. There is no apparent closed-form solution by equating the first derivative of the likelihood to zero. We compute the variance of the first derivative of the negative likelihood function and the expected value of the second derivative. To be able to compute them, we rewrite the likelihood function using the representation of x i as an AR(1) process. Thus i x i = σ ɛ φ i r ξ i and it follows that S n = = σ ɛ k i=1 j=1 v ij k r= l=1 m=1 i=1 j=1 l=1 m=1 (j 1)k+m (i 1)k+l r= s= x (i 1)k+l x (j 1)k+m v ij φ (i+j )k+l+m r s ξ r ξ s. Carefully rearranging and combining the above coefficients, S n is equivalent to: S n = r a rr ξ + r<s a rs ξ r ξ s Next we apply the Martingale Central Limit Theorem to S n which is a quadratic form of independent normal random variables and we obtain m n = E[S n ] = r a rr = 0 and v n = Var[S n ] = r a rr + r,s: r<s 4 a rs 388

5 In a similar fashion we calculate the expected value of the second derivative of the quasi-likelihood function and apply the information sandwich formula to obtain an approximation for the variance of the unbiased estimator ˆφ 1. The covariance matrix in this case has a very complex structure, and finding its inverse is a nontrivial exercise. One could use numerical methods to calculate its inverse. Since that would naturally introduce more error into calculations, we prefer to use a strategy that takes advantage of the fact that it is a particular case of a Toeplitz matrix. Trench(1964) developed an algorithm that calculates the exact inverse of any Toeplitz matrix. His algorithm is faster than the traditional approach that uses the Cholesky decomposition; it is of the order of b, when the matrix has b rows and columns. Then we use numerical methods to calculate the first and second derivatives of the quasi-likelihood function. Given the intractable analytical structure of the relative efficiency, we analyze its values numerically for a few particular cases. We proceed in the following manner. After computing the elements of the matrix V we evaluate each of the coefficients a rs. Each coefficient consists of finite sums only, thus its evaluation is routine algebra. The next step is to calculate the sum of these coefficients over all values of r and s. We take advantage of the fact that both indices r and s have b as an upper bound. Then, taking a closer look at the structure of the coefficients, we distinguish two cases. For r, s we need to evaluate a finite sum. In the other cases, when at least r or s are less than or equal to 1, we exploit the fact that we can separate the sums containing r and s from the other sums, and simply compute these infinite sums (over r and s) as geometric series. In the end we combine all these sums to obtain the final result. Following we present a table illustrating the performance of the Big Blocks estimator (relative to the classical MLE) for different values of b, k and φ. The table also provides a verification of the validity of the theoretical results, by comparing them with their analogue results obtained through simulations. φ b=5 k=100 b=50 k=10 Theory Sim Theory Sim It is clear at this point that this approximation to the likelihood does not lead to an efficient estimator. Although appealing for its simplicity and considerable dimension reduction, it is inefficient for even moderate block sizes.this caveat makes it unfit for realistic problems. Therefore we need to alter the way we compute the minimizing criterion, and take into account more adequately the underlying correlation structure, which leads us to the next technique. 4. Small Blocks Method This method ignores the correlation between blocks but takes into account the true dependence structure within blocks. We construct the quasi-likelihood in this case as the product between the b individual block likelihoods. Since the original process is an AR(1), it is immediate that the quasi-likelihood function in this case has the form: 1 1 L blk = exp (π) b k/ V bl b/ 1 j=1 X bl j T V 1 bl X bl j. To calculate the asymptotic variance, we follow the expansion method. Thus we rewrite the first derivative of the negative log likelihood as a quadratic form of normal random variables. Modulo fixed constants, this equals: S n = σɛ n s 1 φ s r 1 ξ r ξ s + b 1 s= r= s M k +1 mk+1 mk+1 m=0 r= s= φ mk+3 r s ξ r ξ s ) To compute the asymptotic variance we use the information sandwich technique. Unfortunately, even in this simple case the calculations are far from trivial. The final expression for the relative efficiency is not simple enough to allow us study its limiting behavior analytically so we compute it for a few particular cases. Following we present a comparative table between values for the relative efficiency derived as in the theoretical approach described above and values obtained through simulations. φ b=5 k=100 b=50 k=10 Theory Sim Theory Sim From this table we note that the performance of the Small Blocks estimator is quite good, clearly improved compared to the previous case. However, one could think of instances when the assumption of independence between blocks would be too strong. Next we relax this assumption and construct the third approximation to the likelihood. 4.3 Hybrid Method As mentioned earlier, the assumption here is that given the block means, the blocks are independent. We use the means likelihood which we have actually developed for the Big Blocks (see equation (9)) and the conditional likelihoods for each block, given the block mean. The only difficulty here might be in computing the conditional means and covariances. Taking advantage of the special structure of the AR(1) process and applying standard normal multivariate results, one obtains a rather simple form for this matrix and its inverse. Since the calculation of the quasilikelihood function involves both the likelihood of the means and the block conditional likelihoods we need to use again the expansion method. The estimating function in this case is: l hyb = 1 bk log(π) + 1 [ log V (φ) + b log W (φ) + X T V (φ) X ( + X k 1 j µ j (φ) ) T ( W (φ) X k 1 j µ j (φ) ) j=1 where by W we have denoted the inverse of the block conditional covariance matrix and by µ j the block conditional mean. Rearranging the sums, one can rewrite the gradient and Hessian of the quasi-likelihood function as quadratic forms of independent normal random variables. The following table presents the theoretical and simulation based 389

6 values for the asymptotic efficiency: φ b=5 k=100 b=50 k=10 Theory Sim Theory Sim Note that both the Hybrid and the Small Blocks estimators perform very well, much better than the Big Blocks estimator. At this point it is not clear if the Hybrid estimator is always performing better than the Small Blocks estimator. 5 Extension to a Lattice Sampled Process Theoretical computations required to derive the asymptotic variance of the proposed estimators in the spatial case is extremely involved. Therefore, we illustrate here how would one extend the strategy used for the one-dimensional time series problem to its analogous spatial process. Consider a spatial process on an integer lattice, denoted x ij where i and j are integers. Since we model the process by its covariance structure, one of the simplest forms to consider for the covariance structure is the Kronecker product form, i.e. Cov[x ij, x kl ] = γ (1) ik γ() jl, (10) where γ (1) and γ () are the covariances of one-dimensional time series in the horizontal and vertical directions. If we assume that these are both of AR(1) form, with the same autoregressive parameter, then we deduce Cov[x ij, x kl ] = σ xφ i k + j l, (11) where φ < 1 for stationarity. An equivalent definition, which represents (11) as a function of an array of independent N[0, 1] random variables {ξ ij }, is the formula x ij = σ x (1 φ ) r=0 s=0 φ r+s ξ i+r,j+s. (1) We may also represent the process equivalently by x ij φ(x i+1,j + x i,j+1 ) + φ x i+1,j+1 = ɛ ij (13) where ɛ ij = σ x (1 φ )ξ ij are independent N[0, σɛ ], σɛ = σx(1 φ ). In the Kronecker product notation, the covariance function of the process is U U, and the inverse covariance function is U 1 U 1, where U is just the AR(1) covariance matrix. Note that the processes we have defined here lie within the general class of spatial processes on lattices first defined by Whittle (1954). We now consider maximum likelihood estimation of φ. The model is that observations {x ij, 1 i m, 1 i n} have a joint normal distribution with mean 0 and covariances given by (11). We also assume that σɛ is known. The negative log likelihood for φ is then, modulo some fixed constants, 1 x ij x kl v ijkl, i j k l where v ijkl is a component of the inverse covariance matrix evaluated at the (i, j) (k, l) position. Note here that the analytical form of the above estimating function is completely specified and could be rewritten as a quadratic form of independent normal random variables, taking advantage of the representation of x ij in (1) and (13). Therefore we could follow the expansion technique and compare the results for the classical MLE case, when the Fisher information technique leads to valid conjectures. We also plan on working on the practical implementation of all the methods for the spatial setting. We will use as guide for our investigation the conclusions derived from the time series setting. The next step is to analyze a classical data set of large dimensions using the proposed methodology and compare our results with the already established ones. The final attempt is to analyze a particulate matter or ozone data set consisting of too many observations to use the classical MLE technique, thus proving the appeal of this new methodology. References [1] Billingsley, P. (1995), Probability and Measure. Third Edition, Wiley, New York. [] Brockwell, P.J., Davis, R.A. (1991), Time Series: Theory and Methods. Second Edition, Springer-Verlag, New York. [3] Cressie, N. (1993), Statistics for Spatial Data. Second edition, John Wiley, New York. [4] Holland, D.M., Caragea P.C. and Smith, R.L. (001), Trends in Rural Sulfur Concentrations. Preprint [5] Holland, D.M., De Oliveira, V., Cox, L.H. and Smith, R.L. (000), Estimation of regional trends in sulfur dioxide over the eastern United States. Environmetrics, to appear. [6] Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979), Multivariate Analysis. New York: Academic Press [7] Liang, K.Y. and Zeger, S.L. (1986), Longitudinal data analysis using generalized linear models. Biometrika 73, 13. [8] Mardia, K.V. and Marshall, R.J. (1984), Maximum Likelihood estimation of models for residual covariance in spatial regression. Biometrika 71, [9] Smith, R.L. (1996), Estimating nonstationary spatial correlations. Preprint, University of North Carolina [10] Smith, R.L. (001), CBMS Course in Environmental Statistics, University of Washington, June 001. [11] Stein, M.L. (1999), Interpolation of Spatial Data: Some Theory of Kriging. Springer Verlag, New York. [1] Trench, F. William (1964) An algorithm for the inversion of finite Toeplitz matrices J. Soc. Indust. Appl. Math. Vol.1, No [13] Vecchia, A.V. (1988), Estimation and identification for continuous spatial processes. J. Roy. Statist. B [14] Whittle, P. (1954), On stationary processes in the plane. Biometrika 41,

Estimating an ARMA Process

Estimating an ARMA Process Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization 7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

More information

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2. Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given

More information

Time Series Analysis

Time Series Analysis Time Series Analysis Autoregressive, MA and ARMA processes Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 212 Alonso and García-Martos

More information

171:290 Model Selection Lecture II: The Akaike Information Criterion

171:290 Model Selection Lecture II: The Akaike Information Criterion 171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Systems of Linear Equations

Systems of Linear Equations Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and

More information

Multi-variable Calculus and Optimization

Multi-variable Calculus and Optimization Multi-variable Calculus and Optimization Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Multi-variable Calculus and Optimization 1 / 51 EC2040 Topic 3 - Multi-variable Calculus

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

More information

NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

More information

Lecture L3 - Vectors, Matrices and Coordinate Transformations

Lecture L3 - Vectors, Matrices and Coordinate Transformations S. Widnall 16.07 Dynamics Fall 2009 Lecture notes based on J. Peraire Version 2.0 Lecture L3 - Vectors, Matrices and Coordinate Transformations By using vectors and defining appropriate operations between

More information

A logistic approximation to the cumulative normal distribution

A logistic approximation to the cumulative normal distribution A logistic approximation to the cumulative normal distribution Shannon R. Bowling 1 ; Mohammad T. Khasawneh 2 ; Sittichai Kaewkuekool 3 ; Byung Rae Cho 4 1 Old Dominion University (USA); 2 State University

More information

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

1 VECTOR SPACES AND SUBSPACES

1 VECTOR SPACES AND SUBSPACES 1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such

More information

Efficiency and the Cramér-Rao Inequality

Efficiency and the Cramér-Rao Inequality Chapter Efficiency and the Cramér-Rao Inequality Clearly we would like an unbiased estimator ˆφ (X of φ (θ to produce, in the long run, estimates which are fairly concentrated i.e. have high precision.

More information

Language Modeling. Chapter 1. 1.1 Introduction

Language Modeling. Chapter 1. 1.1 Introduction Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

Metric Spaces. Chapter 7. 7.1. Metrics

Metric Spaces. Chapter 7. 7.1. Metrics Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some

More information

1 Portfolio mean and variance

1 Portfolio mean and variance Copyright c 2005 by Karl Sigman Portfolio mean and variance Here we study the performance of a one-period investment X 0 > 0 (dollars) shared among several different assets. Our criterion for measuring

More information

Credit Risk Models: An Overview

Credit Risk Models: An Overview Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

1 Teaching notes on GMM 1.

1 Teaching notes on GMM 1. Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

More information

4.5 Linear Dependence and Linear Independence

4.5 Linear Dependence and Linear Independence 4.5 Linear Dependence and Linear Independence 267 32. {v 1, v 2 }, where v 1, v 2 are collinear vectors in R 3. 33. Prove that if S and S are subsets of a vector space V such that S is a subset of S, then

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Solution of Linear Systems

Solution of Linear Systems Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

Note on the EM Algorithm in Linear Regression Model

Note on the EM Algorithm in Linear Regression Model International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

More information

Lecture 3: Finding integer solutions to systems of linear equations

Lecture 3: Finding integer solutions to systems of linear equations Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

UNIVERSITY OF WAIKATO. Hamilton New Zealand

UNIVERSITY OF WAIKATO. Hamilton New Zealand UNIVERSITY OF WAIKATO Hamilton New Zealand Can We Trust Cluster-Corrected Standard Errors? An Application of Spatial Autocorrelation with Exact Locations Known John Gibson University of Waikato Bonggeun

More information

Unified Lecture # 4 Vectors

Unified Lecture # 4 Vectors Fall 2005 Unified Lecture # 4 Vectors These notes were written by J. Peraire as a review of vectors for Dynamics 16.07. They have been adapted for Unified Engineering by R. Radovitzky. References [1] Feynmann,

More information

Solving Systems of Linear Equations

Solving Systems of Linear Equations LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

α = u v. In other words, Orthogonal Projection

α = u v. In other words, Orthogonal Projection Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

More information

1 The Brownian bridge construction

1 The Brownian bridge construction The Brownian bridge construction The Brownian bridge construction is a way to build a Brownian motion path by successively adding finer scale detail. This construction leads to a relatively easy proof

More information

ECE 842 Report Implementation of Elliptic Curve Cryptography

ECE 842 Report Implementation of Elliptic Curve Cryptography ECE 842 Report Implementation of Elliptic Curve Cryptography Wei-Yang Lin December 15, 2004 Abstract The aim of this report is to illustrate the issues in implementing a practical elliptic curve cryptographic

More information

1 Introduction to Matrices

1 Introduction to Matrices 1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Notes on Determinant

Notes on Determinant ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

More information

1 Solving LPs: The Simplex Algorithm of George Dantzig

1 Solving LPs: The Simplex Algorithm of George Dantzig Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

More information

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

Solving Linear Systems, Continued and The Inverse of a Matrix

Solving Linear Systems, Continued and The Inverse of a Matrix , Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system 1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables

More information

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become

More information

Equations, Inequalities & Partial Fractions

Equations, Inequalities & Partial Fractions Contents Equations, Inequalities & Partial Fractions.1 Solving Linear Equations 2.2 Solving Quadratic Equations 1. Solving Polynomial Equations 1.4 Solving Simultaneous Linear Equations 42.5 Solving Inequalities

More information

Chapter 3: The Multiple Linear Regression Model

Chapter 3: The Multiple Linear Regression Model Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics

More information

Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

More information

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform MATH 433/533, Fourier Analysis Section 11, The Discrete Fourier Transform Now, instead of considering functions defined on a continuous domain, like the interval [, 1) or the whole real line R, we wish

More information

FINITE DIMENSIONAL ORDERED VECTOR SPACES WITH RIESZ INTERPOLATION AND EFFROS-SHEN S UNIMODULARITY CONJECTURE AARON TIKUISIS

FINITE DIMENSIONAL ORDERED VECTOR SPACES WITH RIESZ INTERPOLATION AND EFFROS-SHEN S UNIMODULARITY CONJECTURE AARON TIKUISIS FINITE DIMENSIONAL ORDERED VECTOR SPACES WITH RIESZ INTERPOLATION AND EFFROS-SHEN S UNIMODULARITY CONJECTURE AARON TIKUISIS Abstract. It is shown that, for any field F R, any ordered vector space structure

More information

Solutions to Homework 10

Solutions to Homework 10 Solutions to Homework 1 Section 7., exercise # 1 (b,d): (b) Compute the value of R f dv, where f(x, y) = y/x and R = [1, 3] [, 4]. Solution: Since f is continuous over R, f is integrable over R. Let x

More information

5 Numerical Differentiation

5 Numerical Differentiation D. Levy 5 Numerical Differentiation 5. Basic Concepts This chapter deals with numerical approximations of derivatives. The first questions that comes up to mind is: why do we need to approximate derivatives

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Name: Section Registered In:

Name: Section Registered In: Name: Section Registered In: Math 125 Exam 3 Version 1 April 24, 2006 60 total points possible 1. (5pts) Use Cramer s Rule to solve 3x + 4y = 30 x 2y = 8. Be sure to show enough detail that shows you are

More information

T ( a i x i ) = a i T (x i ).

T ( a i x i ) = a i T (x i ). Chapter 2 Defn 1. (p. 65) Let V and W be vector spaces (over F ). We call a function T : V W a linear transformation form V to W if, for all x, y V and c F, we have (a) T (x + y) = T (x) + T (y) and (b)

More information

Lecture 2: ARMA(p,q) models (part 3)

Lecture 2: ARMA(p,q) models (part 3) Lecture 2: ARMA(p,q) models (part 3) Florian Pelgrin University of Lausanne, École des HEC Department of mathematics (IMEA-Nice) Sept. 2011 - Jan. 2012 Florian Pelgrin (HEC) Univariate time series Sept.

More information

Matrix Representations of Linear Transformations and Changes of Coordinates

Matrix Representations of Linear Transformations and Changes of Coordinates Matrix Representations of Linear Transformations and Changes of Coordinates 01 Subspaces and Bases 011 Definitions A subspace V of R n is a subset of R n that contains the zero element and is closed under

More information

Gaussian Conjugate Prior Cheat Sheet

Gaussian Conjugate Prior Cheat Sheet Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

PYTHAGOREAN TRIPLES KEITH CONRAD

PYTHAGOREAN TRIPLES KEITH CONRAD PYTHAGOREAN TRIPLES KEITH CONRAD 1. Introduction A Pythagorean triple is a triple of positive integers (a, b, c) where a + b = c. Examples include (3, 4, 5), (5, 1, 13), and (8, 15, 17). Below is an ancient

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

FACTORING POLYNOMIALS IN THE RING OF FORMAL POWER SERIES OVER Z

FACTORING POLYNOMIALS IN THE RING OF FORMAL POWER SERIES OVER Z FACTORING POLYNOMIALS IN THE RING OF FORMAL POWER SERIES OVER Z DANIEL BIRMAJER, JUAN B GIL, AND MICHAEL WEINER Abstract We consider polynomials with integer coefficients and discuss their factorization

More information

1.2 Solving a System of Linear Equations

1.2 Solving a System of Linear Equations 1.. SOLVING A SYSTEM OF LINEAR EQUATIONS 1. Solving a System of Linear Equations 1..1 Simple Systems - Basic De nitions As noticed above, the general form of a linear system of m equations in n variables

More information

Sales forecasting # 2

Sales forecasting # 2 Sales forecasting # 2 Arthur Charpentier arthur.charpentier@univ-rennes1.fr 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting

More information

Row Echelon Form and Reduced Row Echelon Form

Row Echelon Form and Reduced Row Echelon Form These notes closely follow the presentation of the material given in David C Lay s textbook Linear Algebra and its Applications (3rd edition) These notes are intended primarily for in-class presentation

More information

The Matrix Elements of a 3 3 Orthogonal Matrix Revisited

The Matrix Elements of a 3 3 Orthogonal Matrix Revisited Physics 116A Winter 2011 The Matrix Elements of a 3 3 Orthogonal Matrix Revisited 1. Introduction In a class handout entitled, Three-Dimensional Proper and Improper Rotation Matrices, I provided a derivation

More information

MATH10040 Chapter 2: Prime and relatively prime numbers

MATH10040 Chapter 2: Prime and relatively prime numbers MATH10040 Chapter 2: Prime and relatively prime numbers Recall the basic definition: 1. Prime numbers Definition 1.1. Recall that a positive integer is said to be prime if it has precisely two positive

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods Chapter 7: Conditionally Positive Definite Functions Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2010 fasshauer@iit.edu MATH 590 Chapter

More information

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission

More information

9.2 Summation Notation

9.2 Summation Notation 9. Summation Notation 66 9. Summation Notation In the previous section, we introduced sequences and now we shall present notation and theorems concerning the sum of terms of a sequence. We begin with a

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

More information

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. Vector space A vector space is a set V equipped with two operations, addition V V (x,y) x + y V and scalar

More information

The Bivariate Normal Distribution

The Bivariate Normal Distribution The Bivariate Normal Distribution This is Section 4.7 of the st edition (2002) of the book Introduction to Probability, by D. P. Bertsekas and J. N. Tsitsiklis. The material in this section was not included

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Multivariate Analysis (Slides 13)

Multivariate Analysis (Slides 13) Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

Chapter 3. Distribution Problems. 3.1 The idea of a distribution. 3.1.1 The twenty-fold way

Chapter 3. Distribution Problems. 3.1 The idea of a distribution. 3.1.1 The twenty-fold way Chapter 3 Distribution Problems 3.1 The idea of a distribution Many of the problems we solved in Chapter 1 may be thought of as problems of distributing objects (such as pieces of fruit or ping-pong balls)

More information

1 Determinants and the Solvability of Linear Systems

1 Determinants and the Solvability of Linear Systems 1 Determinants and the Solvability of Linear Systems In the last section we learned how to use Gaussian elimination to solve linear systems of n equations in n unknowns The section completely side-stepped

More information

Notes on Factoring. MA 206 Kurt Bryan

Notes on Factoring. MA 206 Kurt Bryan The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

More information