# What does intrinsic mean in statistical estimation?

Save this PDF as:

Size: px
Start display at page:

Download "What does intrinsic mean in statistical estimation?"

## Transcription

1 Statistics & Operations Research Transactions SORT 30 ) July-December 006, ISSN: Statistics & Operations Research c Institut d Estadística de Transactions Catalunya What does intrinsic mean in statistical estimation? Gloria García 1 and Josep M. Oller 1 Pompeu Fabra University, Barcelona, Spain University of Barcelona, Barcelona, Spain. Abstract In this paper we review different meanings of the word intrinsic in statistical estimation, focusing our attention on the use of this word in the analysis of the properties of an estimator. We review the intrinsic versions of the bias and the mean square error and results analogous to the Cramér-Rao inequality and Rao-Blackwell theorem. Different results related to the Bernoulli and normal distributions are also considered. MSC: 6F10, 6B10, 6A99. Keywords: Intrinsic bias, mean square Rao distance, information metric. 1 Introduction Statistical estimation is concerned with specifying, in a certain framework, a plausible probabilistic mechanism which explains observed data. The inherent nature of this problem is inductive, although the process of estimation itself is derived through mathematical deductive reasoning. In parametric statistical estimation the probability is assumed to belong to a class indexed by some parameter. Thus the inductive inferences are usually in the form of point or region estimates of the probabilistic mechanism which has generated some specific data. As these estimates are provided through the estimation of the parameter, a label of the probability, different estimators may lead to different methods of induction. * This research is partially sponsored by CGYCIT, PB C0-01 and 1997SGR Generalitat de Catalunya), Spain. Address for correspondence: J. M. Oller, Departament d Estadística, Universitat de Barcelona, Diagonal 645, 0808-Barcelona, Spain. Received: April 006

2 16 What does intrinsic mean in statistical estimation? Under this approach an estimator should not depend on the specified parametrization of the model: this property is known as the functional invariance of an estimator. At this point, the notion of intrinsic estimation is raised for the first time: an estimator is intrinsic if it satisfies this functional invariance property, and in this way is a real probability measure estimator. On the other hand, the bias and the mean square error MSE) are the most commonly accepted measures of the performance of an estimator. Nevertheless these concepts are clearly dependent on the model parametrization and thus unbiasedness and uniformly minimum variance estimation are non-intrinsic. It is also convenient to examine the goodness of an estimator through intrinsic conceptual tools: this is the object of the intrinsic analysis of statistical estimation introduced by Oller & Corcuera 1995) see also Oller 1993b) and Oller 1993a)). These papers consider an intrinsic measure for the bias and the square error taking into account that a parametric statistical model with suitable regularity conditions has a natural Riemannian structure given by the information metric. In this setting, the square error loss is replaced by the square of the corresponding Riemannian distance, known as the information distance or the Rao distance, and the bias is redefined through a convenient vector field based on the geometrical properties of the model. It must be pointed out that there exist other possible intrinsic losses but the square of the Rao distance is the most natural intrinsic version of the square error. In a recent paper of Bernardo & Juárez 003), the author introduces the concept of intrinsic estimation by considering the estimator which minimizes the Bayesian risk, taking as a loss function a symmetrized version of Kullback-Leibler divergence Bernardo & Rueda 00)) and considering a reference prior based on an information-theoretic approach Bernardo 1979) and Berger & Bernardo 199)) which is independent of the model parametrization and in some cases coincides with the Jeffreys uniform prior distribution. In the latter case the prior, usually improper, is proportional to the Riemannian volume corresponding to the information metric Jeffreys 1946)). This estimator is intrinsic as it does not depend on the parametrization of the model. Moreover, observe that both the loss function and the reference prior are derived just from the model and this gives rise to another notion of intrinsic: an estimation procedure is said to be intrinsic if it is formalized only in terms of the model. Observe that in the framework of information geometry, a concept is intrinsic as far as it has a well-defined geometrical meaning. In the present paper we review the basic results of the above-mentioned intrinsic analysis of the statistical estimation. We also examine, for some concrete examples, the intrinsic estimator obtained by minimizing the Bayesian risk using as an intrinsic loss the square of the Rao distance and as a reference prior the Jeffrey s uniform prior. In each case the corresponding estimator is compared with the one obtained by Bernardo & Juárez 003).

3 Gloria García and Josep M. Oller 17 The intrinsic analysis As we pointed out before, the bias and mean square error are not intrinsic concepts. The aim of the intrinsic analysis of the statistical estimation, is to provide intrinsic tools for the analysis of intrinsic estimators, developing in this way a theory analogous to the classical one, based on some natural geometrical structures of the statistical models. In particular, intrinsic versions of the Cramér Rao lower bound and the Rao Blackwell theorem have been established. We first introduce some notation. Let χ, a,µ) be a measure space andθbe a connected open set ofr n. Consider a map f :χ Θ R such that f x,θ) 0 and f x,θ)µdx) defines a probability measure on χ, a) to be denoted as P θ. In the present paper a parametric statistical model is defined as the triple{χ, a,µ) ;Θ ; f}. We will refer toµas the reference measure of the model and toθas the parameter space. In a general frameworkθcan be any manifold modelled in a convenient space asr n, C n, or any Banach or Hilbert space. So even though the following results can be written with more generality, for the sake of simplicity we consider the above-mentioned form for the parameter spaceθ. In that case, it is customary to use the same symbol θ) to denote points and coordinates. Assume that the parametric statistical model is identifiable, i.e. there exists a one-toone map between parametersθand probabilities P θ ; assume also that f satisfies the regularity conditions to guarantee that the Fisher information matrix exists and is a strictly positive definite matrix. In that case Θ has a natural Riemannian manifold structure induced by its information metric and the parametric statistical model is said to be regular. For further details, see Atkinson & Mitchel 1981), Burbea 1986), Burbea & Rao 198) and Rao 1945), among others. As we are assuming that the model is identifiable, an estimatoru of the true probability measure based on a k-size random sample, k N, may be defined as a measurable map fromχ k to the manifoldθ, which induces a probability measure onθknown as the image measure and denoted asν k. Observe that we are viewingθas a manifold, not as an open set ofr n. To define the bias in an intrinsic way, we need the notion of mean or expected value for a random object valued on the manifoldθ. One way to achieve this purpose is through an affine connection on the manifold. Note thatθis equipped with Levi Civita connection, corresponding to the Riemannian structure supplied by the information metric. Next we review the exponential map definition. FixθinΘand let T θ Θ be the tangent space atθ. Givenξ T θ Θ, consider a geodesic curveγ ξ : [0, 1] Θ, starting atθand satisfying dγ ξ dt t=0 =ξ. Such a curve exists as far asξbelongs to an open star-shaped neighbourhood of 0 T θ Θ. In that case, the exponential map is defined as exp θ ξ)= γ ξ 1). Hereafter, we restrict our attention to the Riemannian case, denoting by θ the

4 18 What does intrinsic mean in statistical estimation? norm at T θ Θ and byρthe Riemannian distance. We define and for eachξ S θ we define If we set S θ ={ξ T θ Θ : ξ θ = 1} T θ Θ c θ ξ)=sup{t>0:ρ θ,γ ξ t))=t}. D θ ={tξ T θ Θ : 0 t<c θ ξ) ;ξ S θ } and D θ = exp θ D θ ), it is well known that exp θ mapsd θ diffeomorphically onto D θ. Moreover, if the manifold is complete the boundary ofd θ is mapped by the exponential map onto the boundary of D θ, called the cut locus ofθinθ. For further details see Chavel 1993). T θ Tangent space exp ) θ D θ v-ξ ξ 0 ξ = γ length ξ V = γ length v v V-ξ η length θ geodesic triangle Θ radial distances are preserved γ 1) ξ η Manifold γ 1) v Figure 1: The exponential map For the sake of simplicity, we shall assume thatν k Θ\ D θ )=0, whatever true probability measure in the statistical model is considered. In this case, the inverse of the exponential map, exp 1 θ, is definedν k almost everywhere. For additional details see Chavel 1993), Hicks 1965) or Spivak 1979). For a fixed sample size k, we define the estimator vector field A as A θ x)=exp 1 θ Ux)), θ Θ. which is a C random vector field first order contravariant tensor field) induced on the manifold through the inverse of the exponential map. For a pointθ Θ we denote by E θ the expectation computed with respect to the probability distribution corresponding toθ. We say thatθis a mean value ofu if and

5 Gloria García and Josep M. Oller 19 T θ θ -1 A = exp Ux)) θ θ geodesic Ux) Θ Figure : Estimator vector field only if E θ A θ )=0. It must be pointed out that if a Riemannian centre of mass exists, it satisfies the above condition see Karcher 1977) and Oller & Corcuera 1995)). We say that an estimatoru is intrinsically unbiased if and only if its mean value is the true parameter. A tensorial measure of the bias is the bias vector field B, defined as B θ = E θ A θ ), θ Θ. An invariant bias measure is given by the scalar field B defined as B θ θ, θ Θ. Notice that if B = 0, the estimator is intrinsically unbiased. The estimator vector field A also induces an intrinsic measure analogous to the mean square error. The Riemannian risk ofu, is the scalar field defined as E θ Aθ θ) = Eθ ρ U,θ) ), θ Θ. since Ax) θ =ρ Ux),θ). Notice that in the Euclidean setting the Riemannian risk coincides with the mean square error using an appropriate coordinate system. Finally note that if a mean value exists and is unique, it is natural to regard the expected value of the square of the Riemannian distance, also known as the Rao distance, between the estimated points and their mean value as an intrinsic version of the variance of the estimator. To finish this section, it is convenient to note the importance of the selection of a loss function in a statistical problem. Let us consider the estimation of the probability of success θ 0, 1) in a binary experiment where we perform independent trials until the first success. The corresponding density of the number of is given by

6 130 What does intrinsic mean in statistical estimation? T θ θ -1 mean value of B =E exp Ux))) θ θ θ the image measure exp 1 U P θ θ bias vector geodesic mean value of the image measure P θ U Θ Figure 3: Bias vector field f k;θ)=1 θ) k θ ; k=0, 1,... If we restrict our attention to the class of unbiased estimators, a classical) unbiased estimator U ofθ, must satisfy Uk) 1 θ) k θ=θ, θ 0, 1), k=0 where it follows that k=0 Uk) 1 θ) k is constant for allθ 0, 1). So U0)=1 and Uk)=0 for k 1. In other words: when the first trial is a success, U assignsθequal to 1; otherwiseθis taken to be 0. Observe that, strictly speaking, there is no classical) unbiased estimator for θ since U takes values in the boundary of the parameter space 0, 1). But we can still use the estimator U in a wider setting, extending both the sample space and the parameter space. We can then compare U with the maximum likelihood estimator, Vk)=1/k+1) for k 0, in terms of the mean square error. After some straightforward calculations, we obtain E θ U θ) ) ) = θ θ E θ V θ) ) ) = θ + θ Li 1 θ)+θ lnθ) ) /1 θ) where Li is the dilogarithm function. Further details on this function can be found in Abramovitz 1970), page The next figure represents both mean square error of U and V.

7 Gloria García and Josep M. Oller Figure 4: MSE of U dashed line) and V solid line). It follows that there exist points in the parameter space for which the estimator U is preferable to V since U scores less risk; precisely forθ 0, ) where the upper extreme has been evaluated numerically. This admissibility contradicts the common sense that refuses U: this estimator assignsθto be 0 even when the success occurs in a finite number of trials. This points out the fact that the MSE criterion is not enough to distinguish properly between estimators. Instead of using the MSE we may compute the Riemannian risk for U and V. In the geometric model, the Rao distanceρis given by ρθ 1,θ )= arg tanh ) ) 1 θ 1 arg tanh 1 θ,θ 1,θ 0, 1) which tends to+ whenθ 1 orθ tend to 0. So E θ ρ U,θ)) ) = + meanwhile E θ ρ V,θ)) ) <+. The comparison in terms of Riemannian risk discards the estimator U in favour of the maximum likelihood estimator V, as is reasonable to expect. Furthermore we can observe that the estimator U, which is classically unbiased, has infinite norm of the bias vector. So U is not even intrinsically unbiased, in contrast to V which has finite bias vector norm. θ 3 Intrinsic version of classical results In this section we outline a relationship between the unbiasedness and the Riemannian risk obtaining an intrinsic version of the Cramér Rao lower bound. These results are obtained through the comparison theorems of Riemannian geometry, see Chavel 1993)

8 13 What does intrinsic mean in statistical estimation? and Oller & Corcuera 1995). Other authors have also worked in this direction, such as Hendricks 1991), where random objects on an arbitrary manifold are considered, obtaining a version for the Cramér Rao inequality in the case of unbiased estimators. Recent developments on this subject can be found in Smith 005). Hereafter we consider the framework described in the previous section. Let U be an estimator corresponding to the regular model{χ, a,µ) ;Θ ; f}, where the parameter spaceθis a n dimensional real manifold and assume that for allθ Θ,ν k Θ\ D θ )=0. Theorem 3.1. [Intrinsic Cramér Rao lower bound] Let us assume that E ρ U,θ) ) exists and the covariant derivative of EA) exists and can be obtained by differentiating under the integral sign. Then, 1. We have )) divb) E diva) E ρ U,θ) ) + B, kn where div ) stands for the divergence operator.. If all the sectional Riemannian curvatures K are bounded from above by a nonpositive constantk and divb) n, then E ρ U,θ) ) divb)+1+n 1) K B coth K B )) kn + B. 3. If all sectional Riemannian curvatures K are bounded from above by a positive constantk and dθ)<π/ K, where dθ) is the diameter of the manifold, and divb) 1, then divb)+1+n 1) K dθ) cot K dθ) )) E ρ U,θ) ) kn + B. In particular, for intrinsically unbiased estimators, we have: 4. If all sectional Riemannian curvatures are non-positive, then E ρ U,θ) ) n k 5. If all sectional curvatures are less or equal than a positive constantk and dθ)< π/ K, then E ρ U,θ) ) 1 kn The last result shows up the effect of the Riemannian sectional curvature on the precision which can be attained by an estimator. Observe also that any one dimensional manifold corresponding to one parameter family of probability distributions is always Euclidean and divb) = 1; thus part of

9 Gloria García and Josep M. Oller 133 Theorem 3.1) applies. There are also some well known families of probability distributions which satisfy the assumptions of this last theorem, such as the multinomial, see Atkinson & Mitchel 1981), the negative multinomial distribution, see Oller & Cuadras 1985), or the extreme value distributions, see Oller 1987), among many others. It is easy to check that in the n-variate normal case with known covariance matrixσ, where the Rao distance is the Mahalanobis distance, the sample mean based on a sample of size k is an estimator that attains the intrinsic Cramér Rao lower bound, since E ρ X,µ) ) = E X µ) T Σ 1 X µ) ) = = E trσ 1 X µ)x µ) T ) ) = = tr Σ 1 E X µ)x µ) T)) = tr 1 k I)= n k where v T is the transpose of a vector v. Next we consider a tensorial version of the Cramér-Rao inequality. First we define the dispersion tensor corresponding to an estimatoru as: S θ = E θ A θ A θ ) θ Θ Theorem 3.. The dispersion tensor S satisfies S 1 k Tr,4 [G, [ B E A)) B E A))]]+ B B where Tr i, j and G i, j are, respectively, the contraction and raising operators on index i, j and is the covariant derivative. Here the inequality denotes that the difference between the right and the left hand side is non-negative definite. Now we study how we can decrease the mean square Rao distance of a given estimator. Classically this is achieved by taking the conditional mean value with respect to a sufficient statistic; we shall follow a similar procedure here. But now our random objects are valued on a manifold: we need to define the conditional mean value concept in this case and then obtain an intrinsic version of the Rao Blackwell theorem. Let χ, a, P) be a probability space. Let M be a n dimensional, complete and connected Riemannian manifold. Then M is a complete separable metric space a Polish space) and we will have a regular version of the conditional probability of any M valued random object f with respect to anyσ-algebrad a onχ. In the case where the mean square of the Riemannian distanceρof f exists, we can define Eρ f, m) D)x)= ρ t, m)p f D x, dt), where x χ, B is a Borelian set in M and P f D x, B) is a regular conditional probability of f givend. M

10 134 What does intrinsic mean in statistical estimation? If for each x χ there exists a unique mean value p M corresponding to the conditional probability P f D x, B), i.e. a point p M such that exp 1 p t)p f Dx, dt)=0 p, M we have a map fromχto M that assigns, to each x, the mean value corresponding to P f D x, B). Therefore, if f is a random object on M and D a a σ algebra on χ, we can define the conditional mean value of f with respectd, denoted bym f D), as ad-measurable map, Z, such that Eexp 1 Z f )) D)=0 Z provided it exists. A sufficient condition to assure that the mean value exists and is uniquely defined, is the existence of an open geodesically convex subset N M such that P{ f N}=1. Finally, it is necessary to mention thatm M f D)) M f ), see for instance Kendall 1990). Let us apply these notions to statistical point estimation. Given the regular parametric statistical model{χ, a,µ) ;Θ ; f}, we assume thatθis complete or that there exist a metric space isometry with a subset of a complete and connected Riemannian manifold. We recall now that a real valued function h on a manifold, equipped with an affine connection, is said to be convex if for any geodesicγ, h γ is a convex function. Then we have the following result. Theorem 3.3. Intrinsic Rao Blackwell) Let D be a sufficient σ algebra for the statistical model. Consider an estimatoru such thatm U D) is well defined. Ifθis such thatρ θ, ) is convex then E θ ρ M U D),θ) ) E θ ρ U,θ)). The proof is based on Kendall 1990). Sufficient conditions for the hypothesis of the previous theorem are given in the following result Theorem 3.4. If the sectional curvatures of N are at most 0, ork > 0 with dn)< π/ K, where dn) is the diameter of N, thenρ θ, ) is convex θ Θ. It is not necessarily true that the mean of the square of the Riemannian distance between the true and estimated densities decreases when conditioning ond. For instance, if some of the curvatures are positive and we do not have further information about the diameter of the manifold, we cannot be sure about the convexity of the square of the Riemannian distance. On the other hand, the efficiency of the estimators can be improved by conditioning with respect to a sufficientσ-algebradobtainingm U D). But in general the bias is

11 Gloria García and Josep M. Oller 135 not preserved, in contrast to the classical Rao-Blackwell theorem; in other words, even if U were intrinsically unbiased, M U D) would not be in general intrinsically unbiased since, M M U D)) M U). However the norm of the bias tensor ofm U D) is bounded: if we let B M U D) be the bias tensor, by the Jensen inequality, B M U D) θ θ E θ ρ M U D),θ) ) E θ ρ U,θ)). 4 Examples This section is devoted to examine the goodness of some estimators for several models. Different principles apply in order to select a convenient estimator; here we consider the estimator that minimizes the Riemannian risk for a prior distribution proportional to the Riemannian volume. This approach is related to the ideas developed by Bernardo & Juárez 003), where the authors consider as a loss function a symmetrized version of the Kullback-Leibler divergence instead of the square of the Rao distance and use a reference prior which, in some cases, coincides with the Riemannian volume. Once that estimator is obtained, we examine its intrinsic performance: we compute the corresponding Riemannian risk and its bias vector, precisely the square norm of the intrinsic bias. We also compare this estimator with the maximum likelihood estimator. 4.1 Bernoulli Let X 1,..., X k be a random sample of size k from a Bernoulli distribution with parameter θ, that is with probability density f x ;θ)=θ x 1 θ) 1 x, for x {0, 1}. In that case, the parameter space isθ=0, 1) and the metric tensor is given by 1 gθ)= θ1 θ) We assume the prior distributionπforθbe the Jeffreys prior, that is πθ) 1 θ1 θ) The corresponding joint density ofθand X 1,..., X k ) is then proportional to 1 θ1 θ) θ k i=1 X i 1 θ) k k i=1 X i =θ k i=1 X i 1 1 θ) k k i=1 X i 1 which depends on the sample through the sufficient statistic T X 1,..., X k )=x 1,..., x k ) put T= t. since, = k i=1 X i. When

12 136 What does intrinsic mean in statistical estimation? 1 0 θ t 1 1 θ) k t 1 1 dθ=beta t+, k t+ 1 ) the posterior distributionπ t) based on the Jeffreys prior is as follows πθ t)= 1 Beta t+ 1, k t+ 1 )θ t 1 1 θ) k t 1 where Beta is the Euler beta function. The Bayes estimator related to the loss function given by the square of the Rao distanceρ is θ b s)=arg min θ e 0,1) 1 0 ρ θ e,θ) πθ t) dθ Since an intrinsic estimation procedure is invariant under reparametrization, we perform the change of coordinates defined through the equation 1= ) dθ 1 dξ ξ1 ξ) in order to obtain a metric tensor equal to 1: the Riemannian distance expressed via this coordinate system, known as Cartesian coordinate system, will coincide with the Euclidean distance between the new coordinates. If we solve this differential equation, with the initial conditions equal toξ0) = 0, we obtainξ = arcsin θ) and ξ= arcsin θ); we only consider the first of these two solutions. After some straightforward computations we obtain ρ θ 1,θ )= arccos θ 1 θ θ 1 ) 1 θ ) ) = ξ 1 ξ 1) forξ 1 = arcsin θ 1 ) andξ = arcsin θ ) andθ 1,θ Θ. In the Cartesian setting, the Bayes estimatorξ b s) is equal to the expected value ofξ with respect to the posterior distribution πξ t)= 1 Beta t+ 1, k t+ 1 ) ξ )) t ξ )) k t sin 1 sin Once we apply the change of coordinatesθ=sin ξ ), the estimatorξ b s) is ξ b t)= 1 Beta t+ 1, k t+ ) arcsin θ ) θ t 1 1 θ) k t 1 dθ Expanding arcsin θ ) in power series ofθ,

13 Gloria García and Josep M. Oller 137 arcsin θ )= 1 π j=0 Γ ) j+ 1 j! j+1) θ j+ 1 whereγis the Euler gamma function. After some computations, we obtain ξ b Γ k+1)γ t+1) t)= Γ ) 3 F k+ ) 1 3 Γ t+ 1, 1, t+1; k+ 3, 3 ; 1) ) where 3 F denotes a generalized hypergeometric function. Further details on the gamma, beta and hypergeometric functions can be found on Erdélyi et al. 1955). Finally the Bayes estimatorθ b t) ofθis given by θ b t)=sin Γ k+1)γ t+1) Γ ) 3 F k+ ) 1 3 Γ t+ 1, 1, t+1; k+ 3, 3 ; 1) It is straightforward to prove that and can be approximated by θ b k t)=1 θ b t) θ a t)= t k + 1 t k ) 0.63 k 0.3 ) k with relative errors less that 3.5% for any result based on sample size k 100. The behaviour of these estimators, for different values of k and for small t, is shown in the following table. θ b 0) θ b 1) θ b ) θ a 0) θ a 1) θ a ) k = k = k = k = k = k = k = k = Observe that these estimators do not estimateθas zero when t=0, similarly to the estimator obtained by Bernardo & Juárez 003), which is particularly useful when we are dealing with rare events and small sample sizes. The Riemannian risk of this intrinsic estimator has been evaluated numerically and is represented in Figure. Note that the results are given in terms of the Cartesian coordinatesξ b, in order to guarantee that the physical distance in the plots is proportional to the Rao distance. The Riemannian risk ofθ b is given by

14 138 What does intrinsic mean in statistical estimation? E θ ρ θ b,θ) ) = E ξ ξ b ξ) ) = k ) k ξ b t) ξ) t t=0 sin t ξ ) cos k t) ξ ) which can be numerically computed through expression ). This can be compared with the numerical evaluation of the Riemannian risk of the maximum likelihood estimator θ = t/k, given by E θ ρ θ,θ) ) = E ξ ξ ξ) ) = k ) t k arcsin ξ k t t=0 ξ ) sin t cos k t) ξ ) as we can see in Figure 5. We point out that the computation of the Riemannian risk for the maximum likelihood estimator requires the extension by continuity of the Rao distance given in 1) to the closure of the parameter spaceθasθ takes values on [0, 1] ξ Figure 5: Riemannian risk ofξ b, for k=1 solid line), k= long dashed line), k=10 short dashed line) and k=30 dotted line). For a fixed sample size, observe that the Riemannian risk corresponding toξ b is lower than the Riemannian risk corresponding toξ in a considerable portion of the parameter space, as it is clearly shown in Figure.

15 Gloria García and Josep M. Oller ξ Figure 6: Riemannian risk ofξ, for k=1 solid line), k= long dashed line), k=10 short dashed line) and k=30 dotted line) ξ Figure 7: Riemannian risk corresponding toθ, for k = 1 solid line), k = long dashed line) and corresponding toθ b, for k=1 short dashed line) and k=dotted line). Note that part of the Riemannian risk comes up through the bias of an estimator. Next the square of the norm of the bias vector B b forθ b and B forθ is evaluated numerically. Formally, in the Cartesian coordinate systemξ

16 140 What does intrinsic mean in statistical estimation? B b ξ = E ξ ξ b ) ξ= B ξ = E ξ ξ ) ξ= k ) k ξ ) ξ ) ξ b t) ξ) sin t cos k t) t k ) t k ξ ) ξ ) arcsin ξ sin t cos k t) n t t=0 t=0 The squared norm of the bias vector B b and of B are represented in Figures and respectively ξ Figure 8: B b for k=1 solid line), k= long dashed line), k=10 short dashed line) and k=30 dotted line). Now, when the sample size is fixed, the intrinsic bias corresponding toξ b is greater than the intrinsic bias corresponding toξ in a wide range of values of the model parameter, that is the opposite behaviour showed up by the Riemannian risk. 4. Normal with mean value known Let X 1,..., X k be a random sample of size k from a normal distribution with known mean valueµ 0 and standard deviationσ. Now the parameter space isθ=0,+ ) and the metric tensor for the Nµ 0,σ) model is given by gσ)= σ

17 Gloria García and Josep M. Oller ξ Figure 9: B k=1, solid line) the same curve), k=10 dashed line) and k=30 dotted line). We shall assume again the Jeffreys prior distribution for σ. Thus the joint density for σ and X 1,..., X k ) is proportional to 1 σ k+1 exp 1 σ k X i µ 0 ) depending on the sample through the sufficient statistic S = 1 k ki=1 X i µ 0 ). When X 1,..., X k )=x 1,..., x k ) put S = s. As 0 1 σ k+1 i=1 exp k ) σ s dσ= k 1 ks ) k the corresponding posterior distributionπ s ) based on the Jeffreys prior satisfies πσ s )= ks ) k k 1 Γ ) k 1 σ k+1 Γ ) k exp k ) σ s Denote byρthe Rao distance for the Nµ 0,σ). As we did in the previous example, instead of directly determining σ b s)=arg min σ e 0+ ) + 0 ρ σ e,σ) πσ s ) dσ we perform a change of coordinates to obtain a Cartesian coordinate system. Then we compute the Bayes estimator for the new parameter s coordinateθ; as the estimator obtained in this way is intrinsic, we finish the argument recoveringσfromθ. Formally, the

18 14 What does intrinsic mean in statistical estimation? change of coordinates for which the metric tensor is constant and equal to 1 is obtained by solving the following differential equation: 1= ) dσ dθ with the initial conditionsθ1)=0. We obtainθ= lnσ) andθ= lnσ); we only consider the first of these two solutions. We then obtain ρ σ 1,σ )= ) ln σ1 = θ 1 θ 3) forθ 1 = lnσ 1 andθ = logσ. In the Cartesian setting, the Bayes estimatorθ b s ) forθis the expected value of θ with respect to the corresponding posterior distribution, to be denoted asπ s ). The integral can be solved, after performing the change of coordinatesθ= lnσ) in terms of the gamma functionγand the digamma functionψ, that is the logarithmic derivative ofγ. Formally, θ b s ) = = ks ) k Γ ) k n 3 ln The Bayes estimator forσis then σ b s )= + 0 k s σ σ ) Ψ lnσ) σ k+1 )) k k exp 1 k s )) Ψ exp k ) σ s dσ Observe that this estimator,σ b is a multiple of the maximum likelihood estimator σ = s. We can evaluate the proportionality factor, for some values of n, obtaining the following table. n k exp 1 Ψ )) k n k exp 1 Ψ )) k

19 Gloria García and Josep M. Oller 143 The Riemannian risk ofσ b is given by E σ ρ σ b,σ) ) = E θ θ b θ) ) = 1 Ψ k ) whereψ is the derivative of the digamma function. Observe that the Riemannian risk is constant on the parameter space. Additionally we can compute the square of the norm of the bias vector corresponding toσ b. In the Cartesian coordinate systemθand taking into account that k S is distributed σ asχ k, we have ) B b θ = E θ θ b ) θ=e σ ln k S Ψ )) k lnσ = ) k S E σ ln k σ Ψ = 0 That is, the estimatorσ b is intrinsically unbiased. The bias vector corresponding toσ is given by B θ = E θ θ ) θ=e )) σ ln S lnσ = S E σ ln = 1 )) k Ψ + 1 ) k ln σ which indicates that the estimatorσ has a non-null intrinsic bias. Furthermore, the Bayes estimatorσ b also satisfies the following interesting property related to the unbiasedness: it is the equivariant estimator under the action of the multiplicative groupr + that uniformly minimizes the Riemannian risk. We can summarize the current statistical problem to the model corresponding to the sufficient statistic S which follows a gamma distribution with parameters and k. This family is invariant under the action of the multiplicative group ofr + and it is straightforward to obtain that the equivariant estimators of σ which are function of S= S are of the form T λ S )=λs, λ 0,+ ) a family of estimators which containsσ b k = exp 1 Ψ k )) S, the Bayes estimator, and the maximum likelihood estimator σ = S. In order to obtain the equivariant estimator that minimizes the Riemannian risk, observe that the Rao distance 3) is an invariant loss function with respect to the induced group in the parameter space. This, together with the fact that the induced group acts transitively on the parameter space, makes the risk of any equivariant estimator to be constant on all the parameter space, among them the risk forσ b and for S, as it was shown before. Therefore it is enough to minimize the risk at any point of the parameter space, for instance atσ=1. We want to n σ

20 144 What does intrinsic mean in statistical estimation? determineλ such that It is easy to obtain λ = arg min λ 0,+ ) E 1 ρ λs, 1) ) E 1 ρ λs, 1) ) ) ) = E 1 lnλs ) = 1 Ψ k) 1 + Ψ k) λ ) ) + ln k which attains an absolute minimum at k λ = exp 1 Ψ k) ) so thatσ b is the minimum Riemannian risk equivariant estimator. Finally, observe that the results in Lehmann 1951) guarantee the unbiasedness ofσ b, as we obtained before, since the multiplicative groupr + is commutative, the induced group is transitive andσ b is the equivariant estimator that uniformly minimizes the Riemannian risk. 4.3 Multivariate normal, Σ known Let us consider now the case when the sample X 1,..., X k comes from a n-variate normal distribution with mean valueµand known variance covariance matrixσ 0, positive definite. The joint density function can be expressed as f x 1,..., x k ;µ)=π) nk Σ 0 k etr k Σ 1 s + x µ) x µ) T)) where A denote the determinant of a matrix A, etra) is equal to the exponential mapping evaluated at the trace of the matrix A, x denotes 1 ki=1 k x i and s stands for 1 ki=1 k x i x)x i x) T. In this case, the metric tensor G coincides withσ 1 0. Assuming the Jeffreys prior distribution forµ, the joint density forµand X 1,..., X k )=x 1,..., x k ) is proportional to Σ 0 1 exp k x µ) T Σ 1 0 x µ))) Next we compute the corresponding posterior distribution. Since exp k ) n x µ) T Σ 1 0 R n x µ))) Σ 0 1 π dµ= k the posterior distributionπ x) based on the Jeffreys prior is given by πµ x)=π) n 1 k Σ 0 1 exp 1 ) ) 1 µ x)t k Σ 1 0 µ x) 0

### THE MULTIVARIATE ANALYSIS RESEARCH GROUP. Carles M Cuadras Departament d Estadística Facultat de Biologia Universitat de Barcelona

THE MULTIVARIATE ANALYSIS RESEARCH GROUP Carles M Cuadras Departament d Estadística Facultat de Biologia Universitat de Barcelona The set of statistical methods known as Multivariate Analysis covers a

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### Section 1.1. Introduction to R n

The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

### Metric Spaces. Chapter 7. 7.1. Metrics

Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some

### Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

### We call this set an n-dimensional parallelogram (with one vertex 0). We also refer to the vectors x 1,..., x n as the edges of P.

Volumes of parallelograms 1 Chapter 8 Volumes of parallelograms In the present short chapter we are going to discuss the elementary geometrical objects which we call parallelograms. These are going to

### Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 2, FEBRUARY 2002 359 Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel Lizhong Zheng, Student

### Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Math 241, Exam 1 Information.

Math 241, Exam 1 Information. 9/24/12, LC 310, 11:15-12:05. Exam 1 will be based on: Sections 12.1-12.5, 14.1-14.3. The corresponding assigned homework problems (see http://www.math.sc.edu/ boylan/sccourses/241fa12/241.html)

### On the degrees of freedom in shrinkage estimation

On the degrees of freedom in shrinkage estimation Kengo Kato Graduate School of Economics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan kato ken@hkg.odn.ne.jp October, 2007 Abstract

### 14.11. Geodesic Lines, Local Gauss-Bonnet Theorem

14.11. Geodesic Lines, Local Gauss-Bonnet Theorem Geodesics play a very important role in surface theory and in dynamics. One of the main reasons why geodesics are so important is that they generalize

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine Genon-Catalot To cite this version: Fabienne

### Metrics on SO(3) and Inverse Kinematics

Mathematical Foundations of Computer Graphics and Vision Metrics on SO(3) and Inverse Kinematics Luca Ballan Institute of Visual Computing Optimization on Manifolds Descent approach d is a ascent direction

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Math 5311 Gateaux differentials and Frechet derivatives

Math 5311 Gateaux differentials and Frechet derivatives Kevin Long January 26, 2009 1 Differentiation in vector spaces Thus far, we ve developed the theory of minimization without reference to derivatives.

### Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number

### Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### MATHEMATICS (CLASSES XI XII)

MATHEMATICS (CLASSES XI XII) General Guidelines (i) All concepts/identities must be illustrated by situational examples. (ii) The language of word problems must be clear, simple and unambiguous. (iii)

### ANALYTICAL MATHEMATICS FOR APPLICATIONS 2016 LECTURE NOTES Series

ANALYTICAL MATHEMATICS FOR APPLICATIONS 206 LECTURE NOTES 8 ISSUED 24 APRIL 206 A series is a formal sum. Series a + a 2 + a 3 + + + where { } is a sequence of real numbers. Here formal means that we don

### 3. INNER PRODUCT SPACES

. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

### Math 4310 Handout - Quotient Vector Spaces

Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable

### The Heat Equation. Lectures INF2320 p. 1/88

The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)

### ORDERS OF ELEMENTS IN A GROUP

ORDERS OF ELEMENTS IN A GROUP KEITH CONRAD 1. Introduction Let G be a group and g G. We say g has finite order if g n = e for some positive integer n. For example, 1 and i have finite order in C, since

### BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

### Lecture L3 - Vectors, Matrices and Coordinate Transformations

S. Widnall 16.07 Dynamics Fall 2009 Lecture notes based on J. Peraire Version 2.0 Lecture L3 - Vectors, Matrices and Coordinate Transformations By using vectors and defining appropriate operations between

### Course 421: Algebraic Topology Section 1: Topological Spaces

Course 421: Algebraic Topology Section 1: Topological Spaces David R. Wilkins Copyright c David R. Wilkins 1988 2008 Contents 1 Topological Spaces 1 1.1 Continuity and Topological Spaces...............

### NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document

### Summary of week 8 (Lectures 22, 23 and 24)

WEEK 8 Summary of week 8 (Lectures 22, 23 and 24) This week we completed our discussion of Chapter 5 of [VST] Recall that if V and W are inner product spaces then a linear map T : V W is called an isometry

### THREE DIMENSIONAL GEOMETRY

Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Sequences and Convergence in Metric Spaces

Sequences and Convergence in Metric Spaces Definition: A sequence in a set X (a sequence of elements of X) is a function s : N X. We usually denote s(n) by s n, called the n-th term of s, and write {s

### 1 Short Introduction to Time Series

ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

### Dimension Theory for Ordinary Differential Equations

Vladimir A. Boichenko, Gennadij A. Leonov, Volker Reitmann Dimension Theory for Ordinary Differential Equations Teubner Contents Singular values, exterior calculus and Lozinskii-norms 15 1 Singular values

### 2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

### ISOMETRIES OF R n KEITH CONRAD

ISOMETRIES OF R n KEITH CONRAD 1. Introduction An isometry of R n is a function h: R n R n that preserves the distance between vectors: h(v) h(w) = v w for all v and w in R n, where (x 1,..., x n ) = x

### Pythagorean Theorem. Overview. Grade 8 Mathematics, Quarter 3, Unit 3.1. Number of instructional days: 15 (1 day = minutes) Essential questions

Grade 8 Mathematics, Quarter 3, Unit 3.1 Pythagorean Theorem Overview Number of instructional days: 15 (1 day = 45 60 minutes) Content to be learned Prove the Pythagorean Theorem. Given three side lengths,

### Differentiating under an integral sign

CALIFORNIA INSTITUTE OF TECHNOLOGY Ma 2b KC Border Introduction to Probability and Statistics February 213 Differentiating under an integral sign In the derivation of Maximum Likelihood Estimators, or

### A Introduction to Matrix Algebra and Principal Components Analysis

A Introduction to Matrix Algebra and Principal Components Analysis Multivariate Methods in Education ERSH 8350 Lecture #2 August 24, 2011 ERSH 8350: Lecture 2 Today s Class An introduction to matrix algebra

### 171:290 Model Selection Lecture II: The Akaike Information Criterion

171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information

### ALMOST COMMON PRIORS 1. INTRODUCTION

ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type

### Equations Involving Lines and Planes Standard equations for lines in space

Equations Involving Lines and Planes In this section we will collect various important formulas regarding equations of lines and planes in three dimensional space Reminder regarding notation: any quantity

### 1.3. DOT PRODUCT 19. 6. If θ is the angle (between 0 and π) between two non-zero vectors u and v,

1.3. DOT PRODUCT 19 1.3 Dot Product 1.3.1 Definitions and Properties The dot product is the first way to multiply two vectors. The definition we will give below may appear arbitrary. But it is not. It

### FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES CHRISTOPHER HEIL 1. Cosets and the Quotient Space Any vector space is an abelian group under the operation of vector addition. So, if you are have studied

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### On Gauss s characterization of the normal distribution

Bernoulli 13(1), 2007, 169 174 DOI: 10.3150/07-BEJ5166 On Gauss s characterization of the normal distribution ADELCHI AZZALINI 1 and MARC G. GENTON 2 1 Department of Statistical Sciences, University of

### Linear Algebra: Vectors

A Linear Algebra: Vectors A Appendix A: LINEAR ALGEBRA: VECTORS TABLE OF CONTENTS Page A Motivation A 3 A2 Vectors A 3 A2 Notational Conventions A 4 A22 Visualization A 5 A23 Special Vectors A 5 A3 Vector

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Math Department Student Learning Objectives Updated April, 2014

Math Department Student Learning Objectives Updated April, 2014 Institutional Level Outcomes: Victor Valley College has adopted the following institutional outcomes to define the learning that all students

### Some stability results of parameter identification in a jump diffusion model

Some stability results of parameter identification in a jump diffusion model D. Düvelmeyer Technische Universität Chemnitz, Fakultät für Mathematik, 09107 Chemnitz, Germany Abstract In this paper we discuss

### A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS

A QUIK GUIDE TO THE FOMULAS OF MULTIVAIABLE ALULUS ontents 1. Analytic Geometry 2 1.1. Definition of a Vector 2 1.2. Scalar Product 2 1.3. Properties of the Scalar Product 2 1.4. Length and Unit Vectors

### MATH 132: CALCULUS II SYLLABUS

MATH 32: CALCULUS II SYLLABUS Prerequisites: Successful completion of Math 3 (or its equivalent elsewhere). Math 27 is normally not a sufficient prerequisite for Math 32. Required Text: Calculus: Early

### Factor analysis. Angela Montanari

Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

### How is a vector rotated?

How is a vector rotated? V. Balakrishnan Department of Physics, Indian Institute of Technology, Madras 600 036 Appeared in Resonance, Vol. 4, No. 10, pp. 61-68 (1999) Introduction In an earlier series

### Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

### Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

### 3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field

3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field 77 3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field Overview: The antiderivative in one variable calculus is an important

### Random graphs with a given degree sequence

Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

### THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear

### Basics Inversion and related concepts Random vectors Matrix calculus. Matrix algebra. Patrick Breheny. January 20

Matrix algebra January 20 Introduction Basics The mathematics of multiple regression revolves around ordering and keeping track of large arrays of numbers and solving systems of equations The mathematical

### Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

### x if x 0, x if x < 0.

Chapter 3 Sequences In this chapter, we discuss sequences. We say what it means for a sequence to converge, and define the limit of a convergent sequence. We begin with some preliminary results about the

### Mean Value Coordinates

Mean Value Coordinates Michael S. Floater Abstract: We derive a generalization of barycentric coordinates which allows a vertex in a planar triangulation to be expressed as a convex combination of its

### 2D Geometric Transformations. COMP 770 Fall 2011

2D Geometric Transformations COMP 770 Fall 2011 1 A little quick math background Notation for sets, functions, mappings Linear transformations Matrices Matrix-vector multiplication Matrix-matrix multiplication

### 2.3 Convex Constrained Optimization Problems

42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Duality of linear conic problems

Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

### Logic and Incidence Geometry

Logic and Incidence Geometry February 27, 2013 1 Informal Logic Logic Rule 0. No unstated assumption may be used in a proof. 2 Theorems and Proofs If [hypothesis] then [conclusion]. LOGIC RULE 1. The following

### By W.E. Diewert. July, Linear programming problems are important for a number of reasons:

APPLIED ECONOMICS By W.E. Diewert. July, 3. Chapter : Linear Programming. Introduction The theory of linear programming provides a good introduction to the study of constrained maximization (and minimization)

### LINEAR ALGEBRA W W L CHEN

LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,

### Example 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum. asin. k, a, and b. We study stability of the origin x

Lecture 4. LaSalle s Invariance Principle We begin with a motivating eample. Eample 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum Dynamics of a pendulum with friction can be written

### Lecture 7: Finding Lyapunov Functions 1

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1

### Hypothesis Testing in the Classical Regression Model

LECTURE 5 Hypothesis Testing in the Classical Regression Model The Normal Distribution and the Sampling Distributions It is often appropriate to assume that the elements of the disturbance vector ε within

### Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

### 4. Expanding dynamical systems

4.1. Metric definition. 4. Expanding dynamical systems Definition 4.1. Let X be a compact metric space. A map f : X X is said to be expanding if there exist ɛ > 0 and L > 1 such that d(f(x), f(y)) Ld(x,

### Inner Product Spaces and Orthogonality

Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,

### Math 497C Sep 9, Curves and Surfaces Fall 2004, PSU

Math 497C Sep 9, 2004 1 Curves and Surfaces Fall 2004, PSU Lecture Notes 2 15 sometries of the Euclidean Space Let M 1 and M 2 be a pair of metric space and d 1 and d 2 be their respective metrics We say

### Sets and functions. {x R : x > 0}.

Sets and functions 1 Sets The language of sets and functions pervades mathematics, and most of the important operations in mathematics turn out to be functions or to be expressible in terms of functions.

### MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α

### 1 Orthogonal projections and the approximation

Math 1512 Fall 2010 Notes on least squares approximation Given n data points (x 1, y 1 ),..., (x n, y n ), we would like to find the line L, with an equation of the form y = mx + b, which is the best fit

### Fourier Series. Chapter Some Properties of Functions Goal Preliminary Remarks

Chapter 3 Fourier Series 3.1 Some Properties of Functions 3.1.1 Goal We review some results about functions which play an important role in the development of the theory of Fourier series. These results

### Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

### Course 221: Analysis Academic year , First Semester

Course 221: Analysis Academic year 2007-08, First Semester David R. Wilkins Copyright c David R. Wilkins 1989 2007 Contents 1 Basic Theorems of Real Analysis 1 1.1 The Least Upper Bound Principle................

### MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

### MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

### Prentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009

Content Area: Mathematics Grade Level Expectations: High School Standard: Number Sense, Properties, and Operations Understand the structure and properties of our number system. At their most basic level

### MA651 Topology. Lecture 6. Separation Axioms.

MA651 Topology. Lecture 6. Separation Axioms. This text is based on the following books: Fundamental concepts of topology by Peter O Neil Elements of Mathematics: General Topology by Nicolas Bourbaki Counterexamples

### 24. The Branch and Bound Method

24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

### Inner Product Spaces

Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

### discuss how to describe points, lines and planes in 3 space.

Chapter 2 3 Space: lines and planes In this chapter we discuss how to describe points, lines and planes in 3 space. introduce the language of vectors. discuss various matters concerning the relative position

### Linear Codes. Chapter 3. 3.1 Basics

Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length

### 4. Factor polynomials over complex numbers, describe geometrically, and apply to real-world situations. 5. Determine and apply relationships among syn

I The Real and Complex Number Systems 1. Identify subsets of complex numbers, and compare their structural characteristics. 2. Compare and contrast the properties of real numbers with the properties of

### Matrix Algebra LECTURE 1. Simultaneous Equations Consider a system of m linear equations in n unknowns: y 1 = a 11 x 1 + a 12 x 2 + +a 1n x n,

LECTURE 1 Matrix Algebra Simultaneous Equations Consider a system of m linear equations in n unknowns: y 1 a 11 x 1 + a 12 x 2 + +a 1n x n, (1) y 2 a 21 x 1 + a 22 x 2 + +a 2n x n, y m a m1 x 1 +a m2 x