Calculation of Maximum Entropy Densities with Application to Income Distribution

Size: px
Start display at page:

Download "Calculation of Maximum Entropy Densities with Application to Income Distribution"

Transcription

1 Calculation of Maximum Entropy Densities with Application to Income Distribution Ximing Wu This version: October, 22 In revision at the Journal of Econometrics Abstract This paper shows that there exits a unique maximum entropy density for any finite sample when arithmetic sample moments are used as side conditions. A sequential updating method to calculate the maxent entropy density subject to nown moment constraints is proposed. Instead of imposing the moment constraints simultaneously, the sequential updating method incorporates the moment constraints into the calculation from lower to higher moments and updates the density estimates sequentially. The proposed method is employed to approximate the size distribution of U.S. family income. Numerical experiments and empirical evidence demonstrate the efficiency of this method. JEL Classification: C4; C6; D3. Keywords: Maximum Entropy; Density Estimation; Sequential Updating; Income Distribution. Department of Agricultural and Resource Economics, University of California at Bereley. Bereley, CA Tel: (51) ; fax: (51) ; ximing@are.bereley.edu. I am very grateful to Amos Golan, George Judge, Jeff LaFrance, Jeff Perloff, Arnold Zellner and two anonymous referees for helpful suggestions and discussions. 1

2 1 Introduction A maximum entropy (maxent) density can be obtained by maximizing Shannon s information entropy measure subject to nown moment constraints. According to Jaynes (1957), the maximum entropy distribution is uniquely determined as the one which is maximally noncommittal with regard to missing information, and that it agrees with what is nown, but expresses maximum uncertainty with respect to all other matters. The maximum entropy distribution is the most unbiased distribution that agrees with given moment constraints because any deviation from maximum entropy will imply a bias (Kapur and Kesavan, 1992). The maxent entropy approach is a flexible and powerful tool for density approximation, which nests a whole family of generalized exponential distributions, including the exponential, Pareto, normal, lognormal, gamma, beta distribution as special cases. In mathematical statistics, all of the best nown distributions are maxent distributions given simple moment constraints (Kapur and Kesavan, 1992). The maxent density has found some applications in econometrics. For example, the Bayesian method of moments (BMOM) uses the maxent technique to estimate the posterior density of parameters of interest (Zellner, 1997; Zellner and Tobias, 21). An example from the finance literature is the density estimation of derivative assets where moment constraints are implied by the observed option price (Buchen and Kelly, 1996; Stutzer, 1996; Hawins, 1997). Despite its versatility and flexibility, the maxent density has not been widely used in empirical studies. One possible reason is that there is generally no analytical solution for the maxent density problem and the numerical estimation is rather involved. There are some particular difficulties associated with the numerical solution which typically requires iterative nonlinear optimization (Zellner and Highfield, 1988; Ormoneit and White, 1999; Rocinger and Jondeau, 22). 1 In this study, I discuss the necessary and sufficient condition for a distribution to be uniquely determined by a maxent density. For the purpose of empirical approximation of a size distribution, I show that there exists a unique maxent density for any finite sample when arithmetic moments are used as side conditions. I propose a sequential updating method for the calculation of maxent densities. Compared with the existing studies that consider on the estimation of the maxent density subject to just a few 1 A full scale comparison of all these methods is beyond the scope of this study and therefore is not pursued here. 2

3 moment constraints, the proposed method is able to calculate the maxent density associated with a much higher number of moment constraints. The rest of the paper is organized as follows. Section 2 provides some theoretical bacground. Section 3 discusses the existing studies dealing with the calculation of maxent densities. Section 4 introduces the sequential updating method. Section 5 applies this method to the approximation of U.S. family income distribution. Traditional specification and goodness-of-fit tests, and entropy based test are used within the maximum entropy framewor for model diagnostics. The maxent densities are compared with traditional income distributions Section 6 reports intensive experiments with the proposed method. The last section concludes. 2 The Maxent Density This section provides the reader with some theoretical bacground about maxent densities. I first discuss the necessary and sufficient condition for a distribution to be uniquely determined by maxent procedure. I then show that if arithmetic moments are used as side conditions, there exists a unique maxent density for any finite sample. And we can approximate a continuous distribution arbitrarily well using a maxent density. The maxent density is typically obtained by maximizing Shannon s entropy (defined relative to uniform measure), W = p(x) log p (x) dx, subject to some nown moment constraints or equations of moments. Following Zellner and Highfield (1988) and Ormoneit and White (1999), we will consider only the arithmetic moments of the form x i p (x) dx = µ i, i =, 1,...,. (1) Extension to more general moments (e.g., the geometric moments, E ( ln i x ) for x > ), is straightforward (Kapur and Kesavan, 1992; Zellner and Tobias, 21). Some distributions can not be identified by moment constraints. Durrett (1995) gives the condition under which there exists a unique distribution satisfying certain moment conditions. Theorem 1 Suppose x df (x) has a limit µ for each and lim sup µ 1/2 2 /2 <, 3

4 then F n converges wealy to the unique distribution with these moments as the sample size n goes to infinity. Proof. See Durrett (1995) pp.11. We should focus on only distributions that satisfy this sufficient condition to ensure a well-defined unique distribution subject to moment constraints. In fact almost all the distributions used in empirical studies satisfy this condition. Even the Cauchy distribution, which has no finite moments, can be expressed as a maxent density. 2 However, the existence and uniqueness of the underlying distribution subject to certain moment constraints does not immediately imply the existence and uniqueness of the maxent density subject to these moments. A solution is not guaranteed if we use arbitrary combinations of moments as side conditions. 3 Mead and Papanicolaou (1984) give the necessary and sufficient condition for the moments that leads to a unique maxent density. Without loss of generality, we restrict the discussion in the rest of this section to the Hausdorff moment problem, where the moment problem is defined over [, 1]. 4 Theorem 2 Denote µ as the th moment for a distribution. If m ( ) ( 1) m µ >, m =, 1, 2,..., = there is a unique maxent distribution satisfying theses moment constraints. Proof. See Mead and Papanicolaou (1984) pp.246. Fortunately, we find that for empirical calculation of the maxent density, it is not necessary to chec whether the moments satisfy this condition. 2 If Eln 1 + x 2 is used as a moment constraint along withrp (x) dx = 1, then the maxent density is Γ (b) p (x) = πγ b 1 2 (1 + x 2 ), b > 1 b 2, which includes the Cauchy distribution as a special case when b = 1 (Kapur and Kesavan, 1992). 3 Rocinger and Jondeau (22) conducted a bi-dimensional grid search over the sewness-urtosis domain for standardized moments to locate an authorized domain that leads to a unique maxent density. 4 We can transform every finite sample to be within [, 1]. For example, Ormoneit and White (1999) discuss how to transform a moment problem defined on [, ] to be within [, 1]. Therefore, the discussion in this section applies to moment problem outside of [, 1] range as well. To transform the density function bac to the original location and scale, see Wu (22) for the formula for affine transformation of maxent densities. 4

5 In fact, the sample moments of any finite sample satisfy the conditions in Theorem 2. Lemma 1 Denote ˆµ as the th sample moment of a finite sample, then m ( ( 1) m = ) ˆµ >, m =, 1, 2,.... (2) Therefore, there exists a unique maxent density with moments equal to the given sample moments. Proof. Denote the th sample moment ˆµ = 1 N N n=1 x n, where x n [, 1] and x s are not equal to one. Substituting ˆµ into Equation (2), one gets m ( ( 1) m = = 1 N ) 1 N N n=1 N (1 x n ) m >, n=1 since x n [, 1] for all n and x n 1 for some n. By Weierstrass (Polynomial) Theorem, any continuous function can be approximated arbitrarily well by means of a polynomial. Mead and Papanicolaou (1984) showed that this result can be extended to the approximation of continuous distribution function by the maxent density, which is of the form of an exponential polynomial when arithmetic moments are used as side conditions. Theorem 3 Let P (x) be a nonnegative function integrable in [, 1] whose moments are µ, µ 1,..., and let p (x), = 1, 2,..., be the maxent density associated with the same moments. If F (x) is some continuous function in [, 1] then lim 1 F (x) p (x) dx = 1 x n F (x) P (x) dx. Proof. See Mead and Papanicolaou (1984) pp.248. Theorem 3 suggests that for any finite sample, we can use the maxent density to approximate its underlying distribution arbitrarily well. 5 5 The sample size should be reasonably large to allow precise estimates of the moments. 5

6 3 Calculation of Maxent Density We can use Lagrange s method to solve for the maxent density subject to some moment constraints and obtain the unique global maximum entropy. Denote the Lagrangian L = p (x) log p (x) dx ( λ i i= x i p (x) dx µ i ), (3) a simple application of calculus of variation yields the solution ( ) p (x) = exp λ i x i. (4) Substituting Equation (4) into the normalization constraint p(x)dx = 1, we obtain ( ) exp λ i x i dx = 1. i= Thus λ can be expressed in terms of the remaining Lagrange multipliers: ( ) e λ = exp λ i x i dx = Z. i=1 i= Substituting e λ into the moment condition ( ) µ i = x i exp λ i x i dx = e λ i= ( ) x i exp λ i x i dx, i=1 we have 6 µ i (λ) = µ i e λ = x i exp ( ) i=1 λ ix i dx exp ( ). i=1 λ ix i dx Since an analytical solution is not possible for 2, one must use a non-linear optimization technique to solve for the maxent density. One way to solve the maxent problem is to transform the constrained optimization problem into an unconstrained optimization problem using the dual approach (Golan et al., 1996). Substituting Equation (4) into the Lagrangian 6 We use bold face to indicate vectors. 6

7 (3) and rearranging terms, we then have the dual objective function for an unconstrained optimization problem Γ = ln Z + λ i µ i. We can then use Newton s method to solve for the Lagrange multiplier λ = [λ 1,..., λ ] by iteratively updating i=1 where the gradients are 1 Γ λ (1) = λ () H λ, (5) Γ x i exp ( ) i=1 λ ix i dx = λ i exp ( ) µ i = µ i µ i (λ), i = 1, 2,..., i=1 λ ix i dx and the Hessian H ij = µ i+j (λ) = 2 Γ = µ i+j (λ) µ i (λ) µ j (λ), (6) λ i λ j x i+j exp ( ) i=1 λ ix i dx exp ( ), i, j = 1, 2,.... i=1 λ ix i dx Since the Hessian matrix H is everywhere convex and therefore positive definite, there exists a unique solution. An alternative but numerically equivalent way to proceed is to treat the normalization condition, ( exp ) i= λ ix i dx = 1, the same as other moment conditions. Extending i, j to include zero and replacing µ i (λ) and H ij respectively with and µ i (λ) = x i exp( λ i x i )dx, i= H ij = µ (λ) i+j = x i+j exp( λ i x i )dx, i= 7

8 we can apply the same algorithm as Equation (5) to solve for λ = [λ,..., λ ] iteratively. Both approaches are consistent and efficient (Mead and Papanicolaou, 1984). Zellner and Highfield (1988) employed Newton s method as described above to solve for the maxent density. They used Simpson s rule for numerical integration. The same method was employed by Ormoneit and White (1999, OW henceforth). OW noticed that this method is sensitive to the choice of initial values and only wors for a limited set of moment conditions. They suggested two possible reasons: i) numerical errors may build up during the updating process because of numerical integration; ii) the Hessian is near singular for a large range of λ space. In their study, OW adopted a more accurate technique for the minimization of numerical integration of the form h (x, λ) dx with respect to λ (Gill et al, 1981) and introduced bactracing line search into the updating process. However, the set of moment conditions for which OW s algorithm wors is also limited. OW tested their algorithm on standardized moment conditions with µ 3 in [, 3] and µ 4 in [ µ , 1]. 7 They noted that for µ 3 =, their algorithm failed when µ 4 > 3. Since for µ 3 = and µ 4 = 3 the maxent density is the standard normal distribution, this implies that their algorithm will not wor for distributions with the first three moment conditions identical to those of the standard normal distribution but have larger urtosis. Also, OW reported numerical problems when µ 4 > 1, which suggests that their algorithm is applicable only when µ 3 is in [, 3) because µ 3 3 requires µ 4 1. The modifications proposed by OW may not necessarily lead to a substantial improvement. Mead and Papanicolaou (1984) observed that for the maxent density estimation with up to 1 12 moment constraints, the line search is a hindrance rather than an improvement. Also, more accurate numerical integration techniques may offer only limited improvements. Carter (1993) showed that extreme accuracy in computing Hessian and gradients is often not critical in Newton s method, an observation that is confirmed by our study. We conducted intensive experiments with different numerical 24 7 The condition µ 4 > µ is necessary for the positive definiteness of the matrix µ 2 µ 2 µ 3 µ 2 µ 3 µ 4 Rocinger and Jondeau (22) obtain this boundary numerically. Their finding (µ 4 >.9325µ µ ) is very close to the theoretical values given the fact that they set the range of µ 3 to be [, 4]. 8

9 integration techniques. 8 We found that the final solution was not affected by the choice of integration technique, although more using a accurate numerical integration technique generally reduced the number of iterations. 4 Sequential Updating of the Maxent Density In Bayesian analysis or information processing, it is nown that the order in which information is incorporated into the learning process is irrelevant. 9 Hence instead of imposing all the moment constraints simultaneously, we can impose the moment constraints from lower to higher order and update the density estimates sequentially. As shown in the previous section, solving for the maxent density subject to moment constraints µ is equivalent to solving for the following system of equations ( ) x i exp λ i x i dx = µ i, i =, 1,...,. (7) i= There exists a unique solution if the moment conditions satisfy the necessary and sufficient condition of Theorem 2. Therefore, µ is a function of λ. Denote µ = f (λ), we now f ( ) is a differentiable function since Equation (7) is everywhere continuous and differentiable in λ. By the Inverse Function Theorem, the inverse function of λ = f 1 (µ) = g (µ) is also a differentiable function. Taing Taylor s expansion on λ, we obtain λ = g (µ + µ) = g (µ ) + g (µ ) µ. (8) Equation (8) suggests that we can get the first order approximation of λ corresponding to µ = µ + µ, if λ = g (µ ) is nown. This result is not useful since we do not now the functional form of g ( ). For sufficiently small µ, one possible approach would be to use λ as initial values when we solve for λ = g (µ) using Newton s method. If µ is not small enough, we may not be able to obtain convergence for λ = g (µ) using λ as initial values. In this case, we can divide µ into M finite number of small segments such that µ = M i=1 µ i and solve for λ m = g (µ + m i=1 µ i) using λ m 1 as initial values for m = 1,... M. 8 The techniques include Simpson s method, Clenshaw Curtis quadrature, adaptive Gauss-Kronrod quadrature, adaptive double-exponential quadrature, adaptive Genz- Mali algorithm, and some Monte-Carlo and quasi Monte-Carlo methods. 9 See Zellner (1998) on the order invariance of maximum entropy procedures. 9

10 Eventually we can reach the solution for λ = g (µ) as long as µ satisfy the necessary and sufficient condition of Theorem 2. However, this approach is very inefficient if not infeasible because it involves a multi-dimension grid search if the number of moment constraints is larger than one. Fortunately, we can reduce the search to one dimension if we choose to impose the moment constraints sequentially. Suppose for a given finite sample, we can solve for λ = g (µ ), where µ is the first sample moments, using arbitrary initial values (usually a vector of zeros to avoid arithmetic overflow). Since higher moments are generally not independent of lower moments, the estimates from lower moments can serve as a close proxy for the maxent density that is also subject to additional higher moments. Thus, if we fail to solve for λ +1 = g ( ) µ +1 using arbitrary initial values, we can use λ +1 = [λ ; ] as initial values. Note that the choice of zero as the initial value for λ +1 is not simply for convenience, but is also consistent with the principle of maximum entropy. With only the first moments incorporated into the estimates, λ +1 for p(x) = exp( +1 i= λ ix i ) should be set to zero since no information is incorporated for the estimation of λ +1. In other words, if we do not use µ +1 as side condition, the term x +1 should not appear in the maxent density function. In this sense, zero is the most honest, or the most uninformative guess for λ +1, in terms of information theory. Corresponding to the most uninformative guess λ +1 = is the predicted ( + 1)th moment ν +1 = ( x +1 exp ) i= λ x dx, which is the unique maxent predicted value for µ +1 based on the first moments. 1 If ν +1 is close to µ +1, the difference µ +1 between the vector of actual moments µ +1 and [µ ; v +1 ] is small. Therefore, if we use λ +1 = [λ ; ] as initial values to solve for λ +1 = g ( ) µ +1, the convergence can often be obtained in a few iterations. If we fail to reach the solution using λ +1 as initial values, we can divide the difference between ν +1 and µ +1 into finite small segments and approach the solution using the above approach in multiple steps. We note that the estimation of the maxent density becomes more sensitive to the choice of initial values as the number of moment constraints rises, partially because the Hessian matrix approaches singularity as its di- 1 Maximizing the entropy subject to the first moments is equivalent to maximizing the entropy subject to the same moments and the predicted ( + 1)th moment ν +1. Since ν +1 is a function of the first moments, it is not binding when used together with the first moments as side conditions. Consequently, the Lagrange multiplier λ +1 for ν +1 is zero. 1

11 mension increases. Fortunately, the difference between the predicted moment ν +1 based on the first moments and the actual moment µ +1 approaches zero as increases. This consequence occurs because one can use p (x) = exp( i= λ ix i ) to approximate the underlying distribution for x arbitrarily well for a sufficiently large (Theorem 3). The higher is, the closer is p(x) to the underlying distribution, and subsequently the smaller the difference between µ +1 and the predicted moment ν +1. Hence, the sequential method is especially useful when the number of moment constraints is large. On the other hand, sometimes we do not need to incorporate all the moment conditions. For example, the maxent density subject to the first moment is the exponential distribution of the form p (x) = exp ( λ λ 1 x) and the maxent density subject to the first two moments is the normal distribution of the form p (x) = exp ( λ λ 1 x λ 2 x 2). So the first moment is the sufficient statistics for an exponential distribution and the first two moments are the sufficient statistics for a normal distribution. In this case, the difference between the predicted moment ν +1 and the actual moment µ +1 can serve as a useful indicator to decide whether to impose more moment conditions. 5 Approximation of U.S. Family Income Distribution In this subsection, we apply the sequential method to the approximation of the size distribution of U.S. family income. We run an experiment using U.S. family income data from the 1999 Current Population Survey (CPS) March Supplement. The data consist of 5, observations of family income drawn randomly from the 1999 March CPS. We fit the maxent density p(x) = exp( i= λ ix i ) for from 4 to 12 incremented by Newton s method with a vector of zeros as initial values fails to converge when the number of moment constraints is larger than six, and we use the sequential algorithm instead. Table 1 compares the predicted moment ν +1 based on the first moment constraints and the sample moment µ +1. As the number of moment constraints increases, the prediction becomes more precise. For 8, the predicted and actual moments are virtual identical. This suggests that the information content of additional moment conditions is low when the number of moment constraints is sufficiently large. 11 Typically the income distribution is sewed with an extended right tail, which warrants including at least the first four moments in the estimation. Moreover, we should have even number of moment conditions to ensure that the density function integrates to unity. 11

12 For the exponential family, the method of moments estimates are equivalent to maximum lielihood estimates. 12 Hence, we can use the ( log-lielihood ratio to test the function specification. Given p (x j ) = exp ) i= λ ix i j for j = 1, 2,..., N, the log-lielihood can be conveniently calculated as L = N j=1 ln p (x j) = N i= λ iµ i, where µ i is the ith sample moment. Since the maximized entropy for a discrete variable subject to nown moment constraints is W = N j=1 p (x j) ln p (x j ) = i= λ iµ i, the loglielihood is equivalent to the maximized entropy multiplied by the number of observations. The first column of Table 2 lists the log-lielihood for the estimated maxent density ( and the second column reports the log-lielihood ratio of p +2 (x) = exp ) ( +2 i= λ ix i versus p (x) = exp ) i= λ ix i. This log-lielihood ratio is asymptotically distributed as χ 2 with one degree of freedom (critical value = 3.84 at 5% significance level). The log-lielihood ratio test favors the more general model p +2 (x) for our range of. Soofi et al. (1995) argue that the information discrepancy between two distributions can be measured in terms of their entropy difference. They define an index for comparing two distributions: ID (p, p ) = 1 exp ( K (p : p )), where K (p : p ) = p (x) p(x) p (x) dx is the relative entropy or Kullbac-Leibler distance, which is an information-theoretic measure of discrepancy between two distributions. The ( third column of Table 1 reports the ID indices between p +2 (x) = exp ) ( +2 i= λ ix i and p (x) = exp ) i= λ ix i. We can see that the discrepancy decreases as more moment conditions enter the estimation. This suggests that as the number of moment conditions gets large, the information content of additional moment decreases. We test the goodness-of-fit of the maxent density estimates using a twosided Kolmogorov-Smirnov (KS) test. The fourth column of Table 1 reports the KS statistic of the estimated maxent density. The critical value of KS test at 5% significance level is.192 for our sample. Thus, the KS test fails to reject( the null hypothesis that our income sample is distributed as p (x) = exp ) i= λ ix i, for = 8, 1, 12. To avoid overfitting, we calculate the Aaie Information Criterion (AIC) and Bayesian Information Criterion (BIC) to chec the balance between the accuracy of the estimation and the rule of parsimony. The results are 12 This maximum entropy method is equivalent to the ML approach where the lielihood is defined over the exponential distribution with parameters. Golan et al. (1996) use a duality theorem to show this relationship. 12

13 reported in the fifth and sixth column of Table 1. The AIC test favors the model with 12 moment constraints. The BIC test, which has a greater complexity penalty, favors the model with the first eight moment constraints. Lastly, we compare the maxent densities with two conventional income distributions. We fit a log-normal distribution and a gamma distribution to the income sample. 13 The relevant tests are reported in the last two columns of Table 1. Both of them fail the KS test and are outperformed by our preferred maxent densities in all the tests. Figure 2( reports the histogram of the income sample and the estimated p(x) = exp 12 i= λ ix ). i The fitted density closely resembles the shape of the histogram of the income sample. Although the domain over which the density is evaluated is sufficiently wider than the sample range in either end, the estimated density demonstrates good tail performance at both tails. 6 Further Numerical Experiments Without loss of generality, we apply the sequential updating method on the standardized moments with µ 3 in [, 4] and µ 4 in [ µ , 2], incremented by.1. This range of values for µ 3 and µ 4 is broader than that considered by Ormoneit and White (1999), where µ 3 in [, 3] and µ 4 in [ µ , 1]. 14 The fitted moments of all the estimated densities are the same as the actual moments up to at least 12 decimal places, which demonstrates that the proposed algorithm is extremely precise. When µ 3 = and µ 4 = 3, the maxent density is the standard normal distribution. The theoretical value of standard normal distribution is λ = 1 2 log(2π), λ 1 = λ 3 = λ 4 = and λ 2 = 1 2 since 1 2π exp( x2 2 )dx = exp( 1 2 x2 log(2π) 2 )dx. Using the theoretical values as a benchmar, the estimated ˆλ for µ 3 = and µ 4 = 3 are accurate to at least 15 decimal places. The estimated ˆλ i, i = 1, 2, 3, 4, are plotted against the value of µ 3 and µ 4 in Figure 2 and Figure 3. The patterns of all of them closely resemble those reported by Ormoneit and White (1999) for a smaller range of µ 3 and µ 4. The estimated density functions for [µ 3 =, µ 4 = 1.1], [µ 3 =, µ 4 = 3, the standard normal], and [µ 3 = 4, µ 4 = 2] are plotted in Figure 4. This example demonstrates the maxent density s flexibility to handle 13 The log-normal distribution and gamma distribution are in fact maxent densities subject to certain geometric moment constraints. 14 All the densities in this section are evaluated over [ 2, 2]. 13

14 multimodal distributions. 15 Consistent with the finding of Rocinger and Jondeau (22), we find that for small urtosis, the density is squeezed toward the center. In fact as we can see from Figure 4, the density becomes bi-modal for sufficiently small urtosis. On the other hand, a small mode appears in the tail of the distribution to accommodate large sewness when urtosis is relatively small. The fact that the λ space is rather flat for almost the entire region except near the boundary µ 4 > µ further justifies our method of approaching the optima through adding moment conditions sequentially. Although the functional form λ = g (µ) is unnown, the flatness of λ space suggests that g ( ) is close to zero. Therefore even for moderately large µ, the Taylor approximation λ = g (µ ) + g (µ ) µ can be very accurate for most of the region of λ space. Consequently, using λ = g (µ ) as initial values to solve for λ = g (µ) should be easy since λ λ. 7 Conclusion The maximum entropy (maxent) approach is a powerful and flexible tool for density estimation, which nests most of the commonly used distributions as special cases. In this paper, I discuss the necessary and sufficient conditions for a distribution to be uniquely identified by a maxent density. I show that there exists a unique maxent density for any finite sample when arithmetic sample moments are used as side conditions. The calculation of a maxent density subject to multiple moment constraints is quite sensitive to the choice of initial values. The problem becomes more difficult as the number of moment constraints increases. I propose a sequential updating method for maximum entropy density calculation. Instead of imposing the moment constraints simultaneously, this method incorporates the information contained in the moments into the estimation process from lower to higher moments sequentially. Consistent with the maximum entropy principle, I use the estimated coefficient based on lower moments as initial values to update the density estimates when higher order moment constraints are imposed. I apply the proposed method to approximate the size distribution of 1999 U.S. family income. Traditional specification and goodness-of-fit tests, and entropy based test are used within the maximum entropy framewor for model diagnostics. The maxent densities are compared with traditional income distributions and shown to outperformed 15 The possible number of modes is determined by the number of moments used and their corresponding coefficients λ s (Cobb et al., 1983). 14

15 them in all tests. Empirical examples and intensive numerical experiments suggest that the maximum entropy approach is a powerful tool for empirical approximation of size distributions and the proposed sequential updating method is efficient in calculating various maxent density problems. 15

16 References Buchen, P., M. Kelly, The maximum entropy distribution of an asset inferred from option prices. Journal of Financial and Quantitative Analysis 31(1), Carter, R. G., Numerical experience with a class of algorithms for nonlinear optimization using inexact function and gradient information. SIAM Journal of Scientific Computing 14, Cobb, L., P. Koppstein, N. H. Chen, Estimation and moment recursion relations for multimodal distributions of the exponential family. Journal of the American Statistical Association 78, Durrett, R., Probability: Theory and Examples. Duxbury Press, 2nd edn. Gill, P. E., W. Murray, M. H. Wright, Practical Optimization. Academic Press, San Diego. Golan, A., G. Judge, D. Miller, Maximum Entropy Econometrics: Robust Estimation with Limited Data. John Wiley and Sons, New Yor. Hawins R., Maximum entropy and derivative securities. Advances in Econometrics 12, Jaynes, E. T., Information theory and statistical mechanics. Physics Review 16, Kapur, J. N., H. K. Kesavan, Entropy Optimization Principles with Applications. Academic Press, INC. Mead, L. R., N. Papanicolaou, Maximum entropy in the problem of moments. Journal of Mathematical Physics 25(8), Miller, L. H., Table of percentage points of olmogorov statistics. Journal of the American Statistical Association 51(273), Ormoneit, D., H. White, An efficient algorithm to compute maximum entropy densities. Econometrics Reviews 18(2), Owen, A., Empirical lielihood of linear models. The Annals of Statistics 19, Qin, J., J. Lawless, Empirical lielihood and general estimating equations. The Annals of Statistics 22, Rocinger, M., E. Jondeau, 22. Entropy densities with an application to autoregressive conditional sewness and urtosis. Journal of Econometrics 16,

17 Soofi, E., N. Ebrahimi, M. Habibullah, Information distinguishability with application to analysis of failure data. Journal of Econometrics 9, Stutzer, M., A simple nonparametric approach to derivative security valuation. Journal of Finance 51(5), Wu X., 22. Formula for affine transformation of maxent densities. Unpublished manuscript, University of California, Bereley. Zellner, A., The bayesian method of moments (BMOM): theory and applications. Advances in Econometrics 12, Zellner, A., On order invariance of maximum entropy procedures. Unpublished manuscript, Graduate School of Business, University of Chicago. Zellner, A., R. A. Highfield, Calculation of maximum entropy distribution and approximation of marginal posterior distributions. Journal of Econometrics 37, Zellner, A., J. Tobias, 21. Further results on bayesian method of moments analysis of the multiple regression model. International Economic Review 42(1),

18 Table 1: Sample moments, predicted moments and their difference µ +1 ν +1 ν +1 µ e e e e e e+7 -. Table 2: Specification and goodness-of-fit tests for estimated maxent densities L LR ID KS AIC BIC (1) (2) (3) (4) (5) (6) = = = = = lognormal gamma (1) Log-lielihood (2) Log-lielihood ratio test: p +2 (x) versus p (x) (3) Soofi (1995) s ID index: p +2 (x) versus p (x) (4) Kolmogorov-Smirnov test (5) Aaie Information Criterion (6) Bayesian Information Criterion 18

19 Density Family Income Figure 1: Histogram and estimated maxent density based on the first 12 moments of 1999 family income, x-axis in $1, 19

20 5 4 3 λ µ 3 µ λ µ 3 µ 4 Figure 2: ˆλ 1 (top) and ˆλ 2 (bottom) for µ 3 in [, 4] and µ 4 in [ µ , 2] 2

21 1 1 λ µ 3 µ λ µ 3 µ 4 Figure 3: ˆλ 3 (top) and ˆλ 4 (bottom) for µ 3 in [, 4] and µ 4 in [ µ , 2] 21

22 Figure 4: Estimated maxent density for [µ 3 =, µ 4 = 1.1](dash), [µ 3 =, µ 4 = 3](solid), and [µ 3 = 4, µ 4 = 2](dotted) 22

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all. 1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

171:290 Model Selection Lecture II: The Akaike Information Criterion

171:290 Model Selection Lecture II: The Akaike Information Criterion 171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Centre for Central Banking Studies

Centre for Central Banking Studies Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

More information

MINIMIZATION OF ENTROPY FUNCTIONALS UNDER MOMENT CONSTRAINTS. denote the family of probability density functions g on X satisfying

MINIMIZATION OF ENTROPY FUNCTIONALS UNDER MOMENT CONSTRAINTS. denote the family of probability density functions g on X satisfying MINIMIZATION OF ENTROPY FUNCTIONALS UNDER MOMENT CONSTRAINTS I. Csiszár (Budapest) Given a σ-finite measure space (X, X, µ) and a d-tuple ϕ = (ϕ 1,..., ϕ d ) of measurable functions on X, for a = (a 1,...,

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

More information

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS Revised Edition James Epperson Mathematical Reviews BICENTENNIAL 0, 1 8 0 7 z ewiley wu 2007 r71 BICENTENNIAL WILEY-INTERSCIENCE A John Wiley & Sons, Inc.,

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen (für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

More information

Elasticity Theory Basics

Elasticity Theory Basics G22.3033-002: Topics in Computer Graphics: Lecture #7 Geometric Modeling New York University Elasticity Theory Basics Lecture #7: 20 October 2003 Lecturer: Denis Zorin Scribe: Adrian Secord, Yotam Gingold

More information

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen

More information

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

Review of Fundamental Mathematics

Review of Fundamental Mathematics Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

Microeconomic Theory: Basic Math Concepts

Microeconomic Theory: Basic Math Concepts Microeconomic Theory: Basic Math Concepts Matt Van Essen University of Alabama Van Essen (U of A) Basic Math Concepts 1 / 66 Basic Math Concepts In this lecture we will review some basic mathematical concepts

More information

Non Parametric Inference

Non Parametric Inference Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable

More information

Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method Introduction to dual linear program Given a constraint matrix A, right

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

1 if 1 x 0 1 if 0 x 1

1 if 1 x 0 1 if 0 x 1 Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

More information

The equivalence of logistic regression and maximum entropy models

The equivalence of logistic regression and maximum entropy models The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/

More information

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document

More information

Representation of functions as power series

Representation of functions as power series Representation of functions as power series Dr. Philippe B. Laval Kennesaw State University November 9, 008 Abstract This document is a summary of the theory and techniques used to represent functions

More information

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2. Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given

More information

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

A logistic approximation to the cumulative normal distribution

A logistic approximation to the cumulative normal distribution A logistic approximation to the cumulative normal distribution Shannon R. Bowling 1 ; Mohammad T. Khasawneh 2 ; Sittichai Kaewkuekool 3 ; Byung Rae Cho 4 1 Old Dominion University (USA); 2 State University

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

More information

Metric Spaces. Chapter 7. 7.1. Metrics

Metric Spaces. Chapter 7. 7.1. Metrics Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some

More information

Note on some explicit formulae for twin prime counting function

Note on some explicit formulae for twin prime counting function Notes on Number Theory and Discrete Mathematics Vol. 9, 03, No., 43 48 Note on some explicit formulae for twin prime counting function Mladen Vassilev-Missana 5 V. Hugo Str., 4 Sofia, Bulgaria e-mail:

More information

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni 1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

More information

Investment Statistics: Definitions & Formulas

Investment Statistics: Definitions & Formulas Investment Statistics: Definitions & Formulas The following are brief descriptions and formulas for the various statistics and calculations available within the ease Analytics system. Unless stated otherwise,

More information

Practice problems for Homework 11 - Point Estimation

Practice problems for Homework 11 - Point Estimation Practice problems for Homework 11 - Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Gambling Systems and Multiplication-Invariant Measures

Gambling Systems and Multiplication-Invariant Measures Gambling Systems and Multiplication-Invariant Measures by Jeffrey S. Rosenthal* and Peter O. Schwartz** (May 28, 997.. Introduction. This short paper describes a surprising connection between two previously

More information

APPLIED MATHEMATICS ADVANCED LEVEL

APPLIED MATHEMATICS ADVANCED LEVEL APPLIED MATHEMATICS ADVANCED LEVEL INTRODUCTION This syllabus serves to examine candidates knowledge and skills in introductory mathematical and statistical methods, and their applications. For applications

More information

The Cobb-Douglas Production Function

The Cobb-Douglas Production Function 171 10 The Cobb-Douglas Production Function This chapter describes in detail the most famous of all production functions used to represent production processes both in and out of agriculture. First used

More information

Constrained optimization.

Constrained optimization. ams/econ 11b supplementary notes ucsc Constrained optimization. c 2010, Yonatan Katznelson 1. Constraints In many of the optimization problems that arise in economics, there are restrictions on the values

More information

Forecasting the sales of an innovative agro-industrial product with limited information: A case of feta cheese from buffalo milk in Thailand

Forecasting the sales of an innovative agro-industrial product with limited information: A case of feta cheese from buffalo milk in Thailand Forecasting the sales of an innovative agro-industrial product with limited information: A case of feta cheese from buffalo milk in Thailand Orakanya Kanjanatarakul 1 and Komsan Suriya 2 1 Faculty of Economics,

More information

constraint. Let us penalize ourselves for making the constraint too big. We end up with a

constraint. Let us penalize ourselves for making the constraint too big. We end up with a Chapter 4 Constrained Optimization 4.1 Equality Constraints (Lagrangians) Suppose we have a problem: Maximize 5, (x 1, 2) 2, 2(x 2, 1) 2 subject to x 1 +4x 2 =3 If we ignore the constraint, we get the

More information

Macroeconomic Effects of Financial Shocks Online Appendix

Macroeconomic Effects of Financial Shocks Online Appendix Macroeconomic Effects of Financial Shocks Online Appendix By Urban Jermann and Vincenzo Quadrini Data sources Financial data is from the Flow of Funds Accounts of the Federal Reserve Board. We report the

More information

Stochastic Inventory Control

Stochastic Inventory Control Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

More information

2WB05 Simulation Lecture 8: Generating random variables

2WB05 Simulation Lecture 8: Generating random variables 2WB05 Simulation Lecture 8: Generating random variables Marko Boon http://www.win.tue.nl/courses/2wb05 January 7, 2013 Outline 2/36 1. How do we generate random variables? 2. Fitting distributions Generating

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses Michael R. Powers[ ] Temple University and Tsinghua University Thomas Y. Powers Yale University [June 2009] Abstract We propose a

More information

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of slides,

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

Estimating Industry Multiples

Estimating Industry Multiples Estimating Industry Multiples Malcolm Baker * Harvard University Richard S. Ruback Harvard University First Draft: May 1999 Rev. June 11, 1999 Abstract We analyze industry multiples for the S&P 500 in

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Numerical methods for American options

Numerical methods for American options Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}. Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian

More information

Notes on metric spaces

Notes on metric spaces Notes on metric spaces 1 Introduction The purpose of these notes is to quickly review some of the basic concepts from Real Analysis, Metric Spaces and some related results that will be used in this course.

More information

Chapter 6. Cuboids. and. vol(conv(p ))

Chapter 6. Cuboids. and. vol(conv(p )) Chapter 6 Cuboids We have already seen that we can efficiently find the bounding box Q(P ) and an arbitrarily good approximation to the smallest enclosing ball B(P ) of a set P R d. Unfortunately, both

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Building a Smooth Yield Curve. University of Chicago. Jeff Greco

Building a Smooth Yield Curve. University of Chicago. Jeff Greco Building a Smooth Yield Curve University of Chicago Jeff Greco email: jgreco@math.uchicago.edu Preliminaries As before, we will use continuously compounding Act/365 rates for both the zero coupon rates

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)! Math 7 Fall 205 HOMEWORK 5 SOLUTIONS Problem. 2008 B2 Let F 0 x = ln x. For n 0 and x > 0, let F n+ x = 0 F ntdt. Evaluate n!f n lim n ln n. By directly computing F n x for small n s, we obtain the following

More information

5 Numerical Differentiation

5 Numerical Differentiation D. Levy 5 Numerical Differentiation 5. Basic Concepts This chapter deals with numerical approximations of derivatives. The first questions that comes up to mind is: why do we need to approximate derivatives

More information

Variables Control Charts

Variables Control Charts MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables

More information

Lecture Notes on Elasticity of Substitution

Lecture Notes on Elasticity of Substitution Lecture Notes on Elasticity of Substitution Ted Bergstrom, UCSB Economics 210A March 3, 2011 Today s featured guest is the elasticity of substitution. Elasticity of a function of a single variable Before

More information

DRAFT. Further mathematics. GCE AS and A level subject content

DRAFT. Further mathematics. GCE AS and A level subject content Further mathematics GCE AS and A level subject content July 2014 s Introduction Purpose Aims and objectives Subject content Structure Background knowledge Overarching themes Use of technology Detailed

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Notes on Probability and Statistics

Notes on Probability and Statistics Notes on Probability and Statistics Andrew Forrester January 28, 2009 Contents 1 The Big Picture 1 2 Counting with Combinatorics 2 2.1 Possibly Useful Notation...................................... 2 2.2

More information

BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBERT SPACE REVIEW BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

More information

In this section, we will consider techniques for solving problems of this type.

In this section, we will consider techniques for solving problems of this type. Constrained optimisation roblems in economics typically involve maximising some quantity, such as utility or profit, subject to a constraint for example income. We shall therefore need techniques for solving

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

THREE DIMENSIONAL GEOMETRY

THREE DIMENSIONAL GEOMETRY Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Understanding the Impact of Weights Constraints in Portfolio Theory

Understanding the Impact of Weights Constraints in Portfolio Theory Understanding the Impact of Weights Constraints in Portfolio Theory Thierry Roncalli Research & Development Lyxor Asset Management, Paris thierry.roncalli@lyxor.com January 2010 Abstract In this article,

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

12.5: CHI-SQUARE GOODNESS OF FIT TESTS 125: Chi-Square Goodness of Fit Tests CD12-1 125: CHI-SQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability

More information

Inequality, Mobility and Income Distribution Comparisons

Inequality, Mobility and Income Distribution Comparisons Fiscal Studies (1997) vol. 18, no. 3, pp. 93 30 Inequality, Mobility and Income Distribution Comparisons JOHN CREEDY * Abstract his paper examines the relationship between the cross-sectional and lifetime

More information

Section 4.4 Inner Product Spaces

Section 4.4 Inner Product Spaces Section 4.4 Inner Product Spaces In our discussion of vector spaces the specific nature of F as a field, other than the fact that it is a field, has played virtually no role. In this section we no longer

More information

Mathematical finance and linear programming (optimization)

Mathematical finance and linear programming (optimization) Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may

More information

Properties of sequences Since a sequence is a special kind of function it has analogous properties to functions:

Properties of sequences Since a sequence is a special kind of function it has analogous properties to functions: Sequences and Series A sequence is a special kind of function whose domain is N - the set of natural numbers. The range of a sequence is the collection of terms that make up the sequence. Just as the word

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information