Calculation of Maximum Entropy Densities with Application to Income Distribution

Transcription

1 Calculation of Maximum Entropy Densities with Application to Income Distribution Ximing Wu This version: October, 22 In revision at the Journal of Econometrics Abstract This paper shows that there exits a unique maximum entropy density for any finite sample when arithmetic sample moments are used as side conditions. A sequential updating method to calculate the maxent entropy density subject to nown moment constraints is proposed. Instead of imposing the moment constraints simultaneously, the sequential updating method incorporates the moment constraints into the calculation from lower to higher moments and updates the density estimates sequentially. The proposed method is employed to approximate the size distribution of U.S. family income. Numerical experiments and empirical evidence demonstrate the efficiency of this method. JEL Classification: C4; C6; D3. Keywords: Maximum Entropy; Density Estimation; Sequential Updating; Income Distribution. Department of Agricultural and Resource Economics, University of California at Bereley. Bereley, CA Tel: (51) ; fax: (51) ; ximing@are.bereley.edu. I am very grateful to Amos Golan, George Judge, Jeff LaFrance, Jeff Perloff, Arnold Zellner and two anonymous referees for helpful suggestions and discussions. 1

2 1 Introduction A maximum entropy (maxent) density can be obtained by maximizing Shannon s information entropy measure subject to nown moment constraints. According to Jaynes (1957), the maximum entropy distribution is uniquely determined as the one which is maximally noncommittal with regard to missing information, and that it agrees with what is nown, but expresses maximum uncertainty with respect to all other matters. The maximum entropy distribution is the most unbiased distribution that agrees with given moment constraints because any deviation from maximum entropy will imply a bias (Kapur and Kesavan, 1992). The maxent entropy approach is a flexible and powerful tool for density approximation, which nests a whole family of generalized exponential distributions, including the exponential, Pareto, normal, lognormal, gamma, beta distribution as special cases. In mathematical statistics, all of the best nown distributions are maxent distributions given simple moment constraints (Kapur and Kesavan, 1992). The maxent density has found some applications in econometrics. For example, the Bayesian method of moments (BMOM) uses the maxent technique to estimate the posterior density of parameters of interest (Zellner, 1997; Zellner and Tobias, 21). An example from the finance literature is the density estimation of derivative assets where moment constraints are implied by the observed option price (Buchen and Kelly, 1996; Stutzer, 1996; Hawins, 1997). Despite its versatility and flexibility, the maxent density has not been widely used in empirical studies. One possible reason is that there is generally no analytical solution for the maxent density problem and the numerical estimation is rather involved. There are some particular difficulties associated with the numerical solution which typically requires iterative nonlinear optimization (Zellner and Highfield, 1988; Ormoneit and White, 1999; Rocinger and Jondeau, 22). 1 In this study, I discuss the necessary and sufficient condition for a distribution to be uniquely determined by a maxent density. For the purpose of empirical approximation of a size distribution, I show that there exists a unique maxent density for any finite sample when arithmetic moments are used as side conditions. I propose a sequential updating method for the calculation of maxent densities. Compared with the existing studies that consider on the estimation of the maxent density subject to just a few 1 A full scale comparison of all these methods is beyond the scope of this study and therefore is not pursued here. 2

3 moment constraints, the proposed method is able to calculate the maxent density associated with a much higher number of moment constraints. The rest of the paper is organized as follows. Section 2 provides some theoretical bacground. Section 3 discusses the existing studies dealing with the calculation of maxent densities. Section 4 introduces the sequential updating method. Section 5 applies this method to the approximation of U.S. family income distribution. Traditional specification and goodness-of-fit tests, and entropy based test are used within the maximum entropy framewor for model diagnostics. The maxent densities are compared with traditional income distributions Section 6 reports intensive experiments with the proposed method. The last section concludes. 2 The Maxent Density This section provides the reader with some theoretical bacground about maxent densities. I first discuss the necessary and sufficient condition for a distribution to be uniquely determined by maxent procedure. I then show that if arithmetic moments are used as side conditions, there exists a unique maxent density for any finite sample. And we can approximate a continuous distribution arbitrarily well using a maxent density. The maxent density is typically obtained by maximizing Shannon s entropy (defined relative to uniform measure), W = p(x) log p (x) dx, subject to some nown moment constraints or equations of moments. Following Zellner and Highfield (1988) and Ormoneit and White (1999), we will consider only the arithmetic moments of the form x i p (x) dx = µ i, i =, 1,...,. (1) Extension to more general moments (e.g., the geometric moments, E ( ln i x ) for x > ), is straightforward (Kapur and Kesavan, 1992; Zellner and Tobias, 21). Some distributions can not be identified by moment constraints. Durrett (1995) gives the condition under which there exists a unique distribution satisfying certain moment conditions. Theorem 1 Suppose x df (x) has a limit µ for each and lim sup µ 1/2 2 /2 <, 3

4 then F n converges wealy to the unique distribution with these moments as the sample size n goes to infinity. Proof. See Durrett (1995) pp.11. We should focus on only distributions that satisfy this sufficient condition to ensure a well-defined unique distribution subject to moment constraints. In fact almost all the distributions used in empirical studies satisfy this condition. Even the Cauchy distribution, which has no finite moments, can be expressed as a maxent density. 2 However, the existence and uniqueness of the underlying distribution subject to certain moment constraints does not immediately imply the existence and uniqueness of the maxent density subject to these moments. A solution is not guaranteed if we use arbitrary combinations of moments as side conditions. 3 Mead and Papanicolaou (1984) give the necessary and sufficient condition for the moments that leads to a unique maxent density. Without loss of generality, we restrict the discussion in the rest of this section to the Hausdorff moment problem, where the moment problem is defined over [, 1]. 4 Theorem 2 Denote µ as the th moment for a distribution. If m ( ) ( 1) m µ >, m =, 1, 2,..., = there is a unique maxent distribution satisfying theses moment constraints. Proof. See Mead and Papanicolaou (1984) pp.246. Fortunately, we find that for empirical calculation of the maxent density, it is not necessary to chec whether the moments satisfy this condition. 2 If Eln 1 + x 2 is used as a moment constraint along withrp (x) dx = 1, then the maxent density is Γ (b) p (x) = πγ b 1 2 (1 + x 2 ), b > 1 b 2, which includes the Cauchy distribution as a special case when b = 1 (Kapur and Kesavan, 1992). 3 Rocinger and Jondeau (22) conducted a bi-dimensional grid search over the sewness-urtosis domain for standardized moments to locate an authorized domain that leads to a unique maxent density. 4 We can transform every finite sample to be within [, 1]. For example, Ormoneit and White (1999) discuss how to transform a moment problem defined on [, ] to be within [, 1]. Therefore, the discussion in this section applies to moment problem outside of [, 1] range as well. To transform the density function bac to the original location and scale, see Wu (22) for the formula for affine transformation of maxent densities. 4

5 In fact, the sample moments of any finite sample satisfy the conditions in Theorem 2. Lemma 1 Denote ˆµ as the th sample moment of a finite sample, then m ( ( 1) m = ) ˆµ >, m =, 1, 2,.... (2) Therefore, there exists a unique maxent density with moments equal to the given sample moments. Proof. Denote the th sample moment ˆµ = 1 N N n=1 x n, where x n [, 1] and x s are not equal to one. Substituting ˆµ into Equation (2), one gets m ( ( 1) m = = 1 N ) 1 N N n=1 N (1 x n ) m >, n=1 since x n [, 1] for all n and x n 1 for some n. By Weierstrass (Polynomial) Theorem, any continuous function can be approximated arbitrarily well by means of a polynomial. Mead and Papanicolaou (1984) showed that this result can be extended to the approximation of continuous distribution function by the maxent density, which is of the form of an exponential polynomial when arithmetic moments are used as side conditions. Theorem 3 Let P (x) be a nonnegative function integrable in [, 1] whose moments are µ, µ 1,..., and let p (x), = 1, 2,..., be the maxent density associated with the same moments. If F (x) is some continuous function in [, 1] then lim 1 F (x) p (x) dx = 1 x n F (x) P (x) dx. Proof. See Mead and Papanicolaou (1984) pp.248. Theorem 3 suggests that for any finite sample, we can use the maxent density to approximate its underlying distribution arbitrarily well. 5 5 The sample size should be reasonably large to allow precise estimates of the moments. 5

6 3 Calculation of Maxent Density We can use Lagrange s method to solve for the maxent density subject to some moment constraints and obtain the unique global maximum entropy. Denote the Lagrangian L = p (x) log p (x) dx ( λ i i= x i p (x) dx µ i ), (3) a simple application of calculus of variation yields the solution ( ) p (x) = exp λ i x i. (4) Substituting Equation (4) into the normalization constraint p(x)dx = 1, we obtain ( ) exp λ i x i dx = 1. i= Thus λ can be expressed in terms of the remaining Lagrange multipliers: ( ) e λ = exp λ i x i dx = Z. i=1 i= Substituting e λ into the moment condition ( ) µ i = x i exp λ i x i dx = e λ i= ( ) x i exp λ i x i dx, i=1 we have 6 µ i (λ) = µ i e λ = x i exp ( ) i=1 λ ix i dx exp ( ). i=1 λ ix i dx Since an analytical solution is not possible for 2, one must use a non-linear optimization technique to solve for the maxent density. One way to solve the maxent problem is to transform the constrained optimization problem into an unconstrained optimization problem using the dual approach (Golan et al., 1996). Substituting Equation (4) into the Lagrangian 6 We use bold face to indicate vectors. 6

7 (3) and rearranging terms, we then have the dual objective function for an unconstrained optimization problem Γ = ln Z + λ i µ i. We can then use Newton s method to solve for the Lagrange multiplier λ = [λ 1,..., λ ] by iteratively updating i=1 where the gradients are 1 Γ λ (1) = λ () H λ, (5) Γ x i exp ( ) i=1 λ ix i dx = λ i exp ( ) µ i = µ i µ i (λ), i = 1, 2,..., i=1 λ ix i dx and the Hessian H ij = µ i+j (λ) = 2 Γ = µ i+j (λ) µ i (λ) µ j (λ), (6) λ i λ j x i+j exp ( ) i=1 λ ix i dx exp ( ), i, j = 1, 2,.... i=1 λ ix i dx Since the Hessian matrix H is everywhere convex and therefore positive definite, there exists a unique solution. An alternative but numerically equivalent way to proceed is to treat the normalization condition, ( exp ) i= λ ix i dx = 1, the same as other moment conditions. Extending i, j to include zero and replacing µ i (λ) and H ij respectively with and µ i (λ) = x i exp( λ i x i )dx, i= H ij = µ (λ) i+j = x i+j exp( λ i x i )dx, i= 7

8 we can apply the same algorithm as Equation (5) to solve for λ = [λ,..., λ ] iteratively. Both approaches are consistent and efficient (Mead and Papanicolaou, 1984). Zellner and Highfield (1988) employed Newton s method as described above to solve for the maxent density. They used Simpson s rule for numerical integration. The same method was employed by Ormoneit and White (1999, OW henceforth). OW noticed that this method is sensitive to the choice of initial values and only wors for a limited set of moment conditions. They suggested two possible reasons: i) numerical errors may build up during the updating process because of numerical integration; ii) the Hessian is near singular for a large range of λ space. In their study, OW adopted a more accurate technique for the minimization of numerical integration of the form h (x, λ) dx with respect to λ (Gill et al, 1981) and introduced bactracing line search into the updating process. However, the set of moment conditions for which OW s algorithm wors is also limited. OW tested their algorithm on standardized moment conditions with µ 3 in [, 3] and µ 4 in [ µ , 1]. 7 They noted that for µ 3 =, their algorithm failed when µ 4 > 3. Since for µ 3 = and µ 4 = 3 the maxent density is the standard normal distribution, this implies that their algorithm will not wor for distributions with the first three moment conditions identical to those of the standard normal distribution but have larger urtosis. Also, OW reported numerical problems when µ 4 > 1, which suggests that their algorithm is applicable only when µ 3 is in [, 3) because µ 3 3 requires µ 4 1. The modifications proposed by OW may not necessarily lead to a substantial improvement. Mead and Papanicolaou (1984) observed that for the maxent density estimation with up to 1 12 moment constraints, the line search is a hindrance rather than an improvement. Also, more accurate numerical integration techniques may offer only limited improvements. Carter (1993) showed that extreme accuracy in computing Hessian and gradients is often not critical in Newton s method, an observation that is confirmed by our study. We conducted intensive experiments with different numerical 24 7 The condition µ 4 > µ is necessary for the positive definiteness of the matrix µ 2 µ 2 µ 3 µ 2 µ 3 µ 4 Rocinger and Jondeau (22) obtain this boundary numerically. Their finding (µ 4 >.9325µ µ ) is very close to the theoretical values given the fact that they set the range of µ 3 to be [, 4]. 8

9 integration techniques. 8 We found that the final solution was not affected by the choice of integration technique, although more using a accurate numerical integration technique generally reduced the number of iterations. 4 Sequential Updating of the Maxent Density In Bayesian analysis or information processing, it is nown that the order in which information is incorporated into the learning process is irrelevant. 9 Hence instead of imposing all the moment constraints simultaneously, we can impose the moment constraints from lower to higher order and update the density estimates sequentially. As shown in the previous section, solving for the maxent density subject to moment constraints µ is equivalent to solving for the following system of equations ( ) x i exp λ i x i dx = µ i, i =, 1,...,. (7) i= There exists a unique solution if the moment conditions satisfy the necessary and sufficient condition of Theorem 2. Therefore, µ is a function of λ. Denote µ = f (λ), we now f ( ) is a differentiable function since Equation (7) is everywhere continuous and differentiable in λ. By the Inverse Function Theorem, the inverse function of λ = f 1 (µ) = g (µ) is also a differentiable function. Taing Taylor s expansion on λ, we obtain λ = g (µ + µ) = g (µ ) + g (µ ) µ. (8) Equation (8) suggests that we can get the first order approximation of λ corresponding to µ = µ + µ, if λ = g (µ ) is nown. This result is not useful since we do not now the functional form of g ( ). For sufficiently small µ, one possible approach would be to use λ as initial values when we solve for λ = g (µ) using Newton s method. If µ is not small enough, we may not be able to obtain convergence for λ = g (µ) using λ as initial values. In this case, we can divide µ into M finite number of small segments such that µ = M i=1 µ i and solve for λ m = g (µ + m i=1 µ i) using λ m 1 as initial values for m = 1,... M. 8 The techniques include Simpson s method, Clenshaw Curtis quadrature, adaptive Gauss-Kronrod quadrature, adaptive double-exponential quadrature, adaptive Genz- Mali algorithm, and some Monte-Carlo and quasi Monte-Carlo methods. 9 See Zellner (1998) on the order invariance of maximum entropy procedures. 9

10 Eventually we can reach the solution for λ = g (µ) as long as µ satisfy the necessary and sufficient condition of Theorem 2. However, this approach is very inefficient if not infeasible because it involves a multi-dimension grid search if the number of moment constraints is larger than one. Fortunately, we can reduce the search to one dimension if we choose to impose the moment constraints sequentially. Suppose for a given finite sample, we can solve for λ = g (µ ), where µ is the first sample moments, using arbitrary initial values (usually a vector of zeros to avoid arithmetic overflow). Since higher moments are generally not independent of lower moments, the estimates from lower moments can serve as a close proxy for the maxent density that is also subject to additional higher moments. Thus, if we fail to solve for λ +1 = g ( ) µ +1 using arbitrary initial values, we can use λ +1 = [λ ; ] as initial values. Note that the choice of zero as the initial value for λ +1 is not simply for convenience, but is also consistent with the principle of maximum entropy. With only the first moments incorporated into the estimates, λ +1 for p(x) = exp( +1 i= λ ix i ) should be set to zero since no information is incorporated for the estimation of λ +1. In other words, if we do not use µ +1 as side condition, the term x +1 should not appear in the maxent density function. In this sense, zero is the most honest, or the most uninformative guess for λ +1, in terms of information theory. Corresponding to the most uninformative guess λ +1 = is the predicted ( + 1)th moment ν +1 = ( x +1 exp ) i= λ x dx, which is the unique maxent predicted value for µ +1 based on the first moments. 1 If ν +1 is close to µ +1, the difference µ +1 between the vector of actual moments µ +1 and [µ ; v +1 ] is small. Therefore, if we use λ +1 = [λ ; ] as initial values to solve for λ +1 = g ( ) µ +1, the convergence can often be obtained in a few iterations. If we fail to reach the solution using λ +1 as initial values, we can divide the difference between ν +1 and µ +1 into finite small segments and approach the solution using the above approach in multiple steps. We note that the estimation of the maxent density becomes more sensitive to the choice of initial values as the number of moment constraints rises, partially because the Hessian matrix approaches singularity as its di- 1 Maximizing the entropy subject to the first moments is equivalent to maximizing the entropy subject to the same moments and the predicted ( + 1)th moment ν +1. Since ν +1 is a function of the first moments, it is not binding when used together with the first moments as side conditions. Consequently, the Lagrange multiplier λ +1 for ν +1 is zero. 1

11 mension increases. Fortunately, the difference between the predicted moment ν +1 based on the first moments and the actual moment µ +1 approaches zero as increases. This consequence occurs because one can use p (x) = exp( i= λ ix i ) to approximate the underlying distribution for x arbitrarily well for a sufficiently large (Theorem 3). The higher is, the closer is p(x) to the underlying distribution, and subsequently the smaller the difference between µ +1 and the predicted moment ν +1. Hence, the sequential method is especially useful when the number of moment constraints is large. On the other hand, sometimes we do not need to incorporate all the moment conditions. For example, the maxent density subject to the first moment is the exponential distribution of the form p (x) = exp ( λ λ 1 x) and the maxent density subject to the first two moments is the normal distribution of the form p (x) = exp ( λ λ 1 x λ 2 x 2). So the first moment is the sufficient statistics for an exponential distribution and the first two moments are the sufficient statistics for a normal distribution. In this case, the difference between the predicted moment ν +1 and the actual moment µ +1 can serve as a useful indicator to decide whether to impose more moment conditions. 5 Approximation of U.S. Family Income Distribution In this subsection, we apply the sequential method to the approximation of the size distribution of U.S. family income. We run an experiment using U.S. family income data from the 1999 Current Population Survey (CPS) March Supplement. The data consist of 5, observations of family income drawn randomly from the 1999 March CPS. We fit the maxent density p(x) = exp( i= λ ix i ) for from 4 to 12 incremented by Newton s method with a vector of zeros as initial values fails to converge when the number of moment constraints is larger than six, and we use the sequential algorithm instead. Table 1 compares the predicted moment ν +1 based on the first moment constraints and the sample moment µ +1. As the number of moment constraints increases, the prediction becomes more precise. For 8, the predicted and actual moments are virtual identical. This suggests that the information content of additional moment conditions is low when the number of moment constraints is sufficiently large. 11 Typically the income distribution is sewed with an extended right tail, which warrants including at least the first four moments in the estimation. Moreover, we should have even number of moment conditions to ensure that the density function integrates to unity. 11

12 For the exponential family, the method of moments estimates are equivalent to maximum lielihood estimates. 12 Hence, we can use the ( log-lielihood ratio to test the function specification. Given p (x j ) = exp ) i= λ ix i j for j = 1, 2,..., N, the log-lielihood can be conveniently calculated as L = N j=1 ln p (x j) = N i= λ iµ i, where µ i is the ith sample moment. Since the maximized entropy for a discrete variable subject to nown moment constraints is W = N j=1 p (x j) ln p (x j ) = i= λ iµ i, the loglielihood is equivalent to the maximized entropy multiplied by the number of observations. The first column of Table 2 lists the log-lielihood for the estimated maxent density ( and the second column reports the log-lielihood ratio of p +2 (x) = exp ) ( +2 i= λ ix i versus p (x) = exp ) i= λ ix i. This log-lielihood ratio is asymptotically distributed as χ 2 with one degree of freedom (critical value = 3.84 at 5% significance level). The log-lielihood ratio test favors the more general model p +2 (x) for our range of. Soofi et al. (1995) argue that the information discrepancy between two distributions can be measured in terms of their entropy difference. They define an index for comparing two distributions: ID (p, p ) = 1 exp ( K (p : p )), where K (p : p ) = p (x) p(x) p (x) dx is the relative entropy or Kullbac-Leibler distance, which is an information-theoretic measure of discrepancy between two distributions. The ( third column of Table 1 reports the ID indices between p +2 (x) = exp ) ( +2 i= λ ix i and p (x) = exp ) i= λ ix i. We can see that the discrepancy decreases as more moment conditions enter the estimation. This suggests that as the number of moment conditions gets large, the information content of additional moment decreases. We test the goodness-of-fit of the maxent density estimates using a twosided Kolmogorov-Smirnov (KS) test. The fourth column of Table 1 reports the KS statistic of the estimated maxent density. The critical value of KS test at 5% significance level is.192 for our sample. Thus, the KS test fails to reject( the null hypothesis that our income sample is distributed as p (x) = exp ) i= λ ix i, for = 8, 1, 12. To avoid overfitting, we calculate the Aaie Information Criterion (AIC) and Bayesian Information Criterion (BIC) to chec the balance between the accuracy of the estimation and the rule of parsimony. The results are 12 This maximum entropy method is equivalent to the ML approach where the lielihood is defined over the exponential distribution with parameters. Golan et al. (1996) use a duality theorem to show this relationship. 12

13 reported in the fifth and sixth column of Table 1. The AIC test favors the model with 12 moment constraints. The BIC test, which has a greater complexity penalty, favors the model with the first eight moment constraints. Lastly, we compare the maxent densities with two conventional income distributions. We fit a log-normal distribution and a gamma distribution to the income sample. 13 The relevant tests are reported in the last two columns of Table 1. Both of them fail the KS test and are outperformed by our preferred maxent densities in all the tests. Figure 2( reports the histogram of the income sample and the estimated p(x) = exp 12 i= λ ix ). i The fitted density closely resembles the shape of the histogram of the income sample. Although the domain over which the density is evaluated is sufficiently wider than the sample range in either end, the estimated density demonstrates good tail performance at both tails. 6 Further Numerical Experiments Without loss of generality, we apply the sequential updating method on the standardized moments with µ 3 in [, 4] and µ 4 in [ µ , 2], incremented by.1. This range of values for µ 3 and µ 4 is broader than that considered by Ormoneit and White (1999), where µ 3 in [, 3] and µ 4 in [ µ , 1]. 14 The fitted moments of all the estimated densities are the same as the actual moments up to at least 12 decimal places, which demonstrates that the proposed algorithm is extremely precise. When µ 3 = and µ 4 = 3, the maxent density is the standard normal distribution. The theoretical value of standard normal distribution is λ = 1 2 log(2π), λ 1 = λ 3 = λ 4 = and λ 2 = 1 2 since 1 2π exp( x2 2 )dx = exp( 1 2 x2 log(2π) 2 )dx. Using the theoretical values as a benchmar, the estimated ˆλ for µ 3 = and µ 4 = 3 are accurate to at least 15 decimal places. The estimated ˆλ i, i = 1, 2, 3, 4, are plotted against the value of µ 3 and µ 4 in Figure 2 and Figure 3. The patterns of all of them closely resemble those reported by Ormoneit and White (1999) for a smaller range of µ 3 and µ 4. The estimated density functions for [µ 3 =, µ 4 = 1.1], [µ 3 =, µ 4 = 3, the standard normal], and [µ 3 = 4, µ 4 = 2] are plotted in Figure 4. This example demonstrates the maxent density s flexibility to handle 13 The log-normal distribution and gamma distribution are in fact maxent densities subject to certain geometric moment constraints. 14 All the densities in this section are evaluated over [ 2, 2]. 13

14 multimodal distributions. 15 Consistent with the finding of Rocinger and Jondeau (22), we find that for small urtosis, the density is squeezed toward the center. In fact as we can see from Figure 4, the density becomes bi-modal for sufficiently small urtosis. On the other hand, a small mode appears in the tail of the distribution to accommodate large sewness when urtosis is relatively small. The fact that the λ space is rather flat for almost the entire region except near the boundary µ 4 > µ further justifies our method of approaching the optima through adding moment conditions sequentially. Although the functional form λ = g (µ) is unnown, the flatness of λ space suggests that g ( ) is close to zero. Therefore even for moderately large µ, the Taylor approximation λ = g (µ ) + g (µ ) µ can be very accurate for most of the region of λ space. Consequently, using λ = g (µ ) as initial values to solve for λ = g (µ) should be easy since λ λ. 7 Conclusion The maximum entropy (maxent) approach is a powerful and flexible tool for density estimation, which nests most of the commonly used distributions as special cases. In this paper, I discuss the necessary and sufficient conditions for a distribution to be uniquely identified by a maxent density. I show that there exists a unique maxent density for any finite sample when arithmetic sample moments are used as side conditions. The calculation of a maxent density subject to multiple moment constraints is quite sensitive to the choice of initial values. The problem becomes more difficult as the number of moment constraints increases. I propose a sequential updating method for maximum entropy density calculation. Instead of imposing the moment constraints simultaneously, this method incorporates the information contained in the moments into the estimation process from lower to higher moments sequentially. Consistent with the maximum entropy principle, I use the estimated coefficient based on lower moments as initial values to update the density estimates when higher order moment constraints are imposed. I apply the proposed method to approximate the size distribution of 1999 U.S. family income. Traditional specification and goodness-of-fit tests, and entropy based test are used within the maximum entropy framewor for model diagnostics. The maxent densities are compared with traditional income distributions and shown to outperformed 15 The possible number of modes is determined by the number of moments used and their corresponding coefficients λ s (Cobb et al., 1983). 14

15 them in all tests. Empirical examples and intensive numerical experiments suggest that the maximum entropy approach is a powerful tool for empirical approximation of size distributions and the proposed sequential updating method is efficient in calculating various maxent density problems. 15

16 References Buchen, P., M. Kelly, The maximum entropy distribution of an asset inferred from option prices. Journal of Financial and Quantitative Analysis 31(1), Carter, R. G., Numerical experience with a class of algorithms for nonlinear optimization using inexact function and gradient information. SIAM Journal of Scientific Computing 14, Cobb, L., P. Koppstein, N. H. Chen, Estimation and moment recursion relations for multimodal distributions of the exponential family. Journal of the American Statistical Association 78, Durrett, R., Probability: Theory and Examples. Duxbury Press, 2nd edn. Gill, P. E., W. Murray, M. H. Wright, Practical Optimization. Academic Press, San Diego. Golan, A., G. Judge, D. Miller, Maximum Entropy Econometrics: Robust Estimation with Limited Data. John Wiley and Sons, New Yor. Hawins R., Maximum entropy and derivative securities. Advances in Econometrics 12, Jaynes, E. T., Information theory and statistical mechanics. Physics Review 16, Kapur, J. N., H. K. Kesavan, Entropy Optimization Principles with Applications. Academic Press, INC. Mead, L. R., N. Papanicolaou, Maximum entropy in the problem of moments. Journal of Mathematical Physics 25(8), Miller, L. H., Table of percentage points of olmogorov statistics. Journal of the American Statistical Association 51(273), Ormoneit, D., H. White, An efficient algorithm to compute maximum entropy densities. Econometrics Reviews 18(2), Owen, A., Empirical lielihood of linear models. The Annals of Statistics 19, Qin, J., J. Lawless, Empirical lielihood and general estimating equations. The Annals of Statistics 22, Rocinger, M., E. Jondeau, 22. Entropy densities with an application to autoregressive conditional sewness and urtosis. Journal of Econometrics 16,

17 Soofi, E., N. Ebrahimi, M. Habibullah, Information distinguishability with application to analysis of failure data. Journal of Econometrics 9, Stutzer, M., A simple nonparametric approach to derivative security valuation. Journal of Finance 51(5), Wu X., 22. Formula for affine transformation of maxent densities. Unpublished manuscript, University of California, Bereley. Zellner, A., The bayesian method of moments (BMOM): theory and applications. Advances in Econometrics 12, Zellner, A., On order invariance of maximum entropy procedures. Unpublished manuscript, Graduate School of Business, University of Chicago. Zellner, A., R. A. Highfield, Calculation of maximum entropy distribution and approximation of marginal posterior distributions. Journal of Econometrics 37, Zellner, A., J. Tobias, 21. Further results on bayesian method of moments analysis of the multiple regression model. International Economic Review 42(1),

18 Table 1: Sample moments, predicted moments and their difference µ +1 ν +1 ν +1 µ e e e e e e+7 -. Table 2: Specification and goodness-of-fit tests for estimated maxent densities L LR ID KS AIC BIC (1) (2) (3) (4) (5) (6) = = = = = lognormal gamma (1) Log-lielihood (2) Log-lielihood ratio test: p +2 (x) versus p (x) (3) Soofi (1995) s ID index: p +2 (x) versus p (x) (4) Kolmogorov-Smirnov test (5) Aaie Information Criterion (6) Bayesian Information Criterion 18