Optimal Income Taxation: Mirrlees Meets Ramsey Jonathan Heathcote FRB Minneapolis Hitoshi Tsujiyama Goethe University Frankfurt Ohio State University, April 17 2014
The views expressed herein are those of the authors and not necessarily those of the Federal Reserve Bank of Minneapolis or the Federal Reserve System
How should we tax income? What structure of labor income taxation offers best trade-off between benefits of public insurance and costs of distortionary taxes? Systems currently in place combine: Tax rates that increase with income Transfers that decline with income Proposals for a flat tax system with universal transfers Friedman (1962) Mirrlees (1971)
This Paper We compare 3 tax and transfer systems: 1. Affine tax system: T (y) = τ 0 + τ 1 y constant marginal rates with lump-sum transfers 2. HSV tax system: T (y) = y λy 1 τ increasing marginal rates without transfers 3. Optimal tax system fully non-linear
Main Findings Best tax and transfer system in the HSV class better than the best affine tax system More valuable to have marginal tax rates increase with income than to have lump-sum transfers Welfare gains moving from the current tax system to the optimal one are tiny
Mirrlees Approach to Tax Design: Mirrlees (1971), Diamond (1988), Saez (2001) Planner only observes earnings = productivity effort Think of planner choosing earnings x and cons. c for each unobservable productivity type α Include incentive constraints, s.t. each type prefers the earnings level intended for their type Allocations are constrained efficient Trace out tax decentralization T(x(α)) = x(α) c(α)
Novel Elements of Our Analysis 1. Our model has a distinct role for private insurance Standard decentralization of efficient allocations delivers all insurance through tax system Very progressive taxes 2. We use a SWF that rationalizes amount of redistribution embedded in observed tax system Analyzes typically assume utilitarian social welfare function Strong desire for redistribution 3. Explore whether simple Ramsey-style tax functions can come close to decentralizing efficient allocations Affine tax function HSV tax function
Static environment Environment 1 Heterogeneous individual labor productivity w Log productivity is sum of two independent stochastic components α no private insurance ε private insurance log w = α + ε planner sees neither component of productivity (later introduce a third productivity component κ that the planner can observe)
Environment 2 Common preferences u(c, h) = log(c) h1+σ 1 + σ Production linear in aggregate effective hours exp(α + ε)h(α, ε)df α df ε = c(α, ε)df α df ε + G
Planner s Problems Seeks to maximize SWF denoted W(α, ε) Takes G as given Takes private insurance for ε as given Only observes total income y = earnings plus private insurance income taxes must be based on income Think of planner as choosing consumption and income for each private type (α, ε) Include incentive constraints s.t. each private type (α, ε) prefers the income level intended for their type
Timeline 1. First Stage Planner offers menu of contracts {c( α, ε), y( α, ε)} Agents draw idiosyncratic α and report α 2. Second Stage Agents buy private insurance against insurable shock ε Draw ε, receive insurance payments and report ε Work sufficient hours to deliver y( α, ε) Receive consumption c( α, ε)
Limitations of Our Analysis 1. We abstract from life-cycle uninsurable shocks Farhi and Werning, 2010; Golosov, Troshkin and Tsyvinkski 2011 2. We abstract from distortions to skill investment Heathcote, Storesletten and Violante, 2013; Guvenen, Kuruscu and Ozkan, 2012; Krueger and Ludwig, 2013
First result: Cannot condition on ε Offered contracts take the form {c( α), y( α)} Private insurance markets utility cost of delivering resources independent of ε undercuts planner s ability to condition allocations on ε Planner cannot take over private insurance Distinct roles for public and private insurance Note: if ε were observable, public insurance would be better than private
Planner s Problem: Second Best max c(α),y(α) s.t. W(α)U(α, α)dfα y(α)dfα c(α)df α + G U(α, α) U(α, α) α, α where U(α, α) max h(ε),b(ε) s.t. { } log c( α) h(ε;α, α)1+σ 1+σ df ε Q(ε)B(ε; α, α)dε = 0 exp(α + ε)h(ε; α, α) + B(ε; α, α) = y( α) ε price of insurance E Q(ε)dε = E df ε
Planner s Problem: Second Best Substituting the solution to the second problem into the first { ( ) } 1+σ W(α) log c(α) Ω y(α) exp(α) df α max c(α),y(α) subject to y(α)dfα c(α)df α + G ( ) 1+σ ( 1+σ log c(α) Ω y(α) exp(α) log c( α) Ω y(ã) exp(α)) α ( exp(ε) 1+σ σ df ε ) σ where Ω 1+σ Standard Mirrlees Problem Solution can be decentralized with income (consumption) taxes
Planner s Problem: Ramsey max τ s.t. W(α) { u(c(α, ε), h(α, ε))dfε } dfα y(α, ε)dfα df ε c(α, ε)df α df ε + G where c(α, ε) and h(α, ε) are the solutions to { } max log(c(α, ε)) h(α,ε)1+σ 1+σ df ε c(α,ε),h(α,ε),b(α,ε) s.t. Q(ε)B(α, ε)dε = 0 c(α, ε) y(α, ε) T(y(α, ε); τ) ε where y(α, ε) exp(α + ε)h(α, ε) + B(α, ε)
Baseline Tax System Assume the current US tax / transfer system is given by T (y; λ, τ) = y λy 1 τ Functional form introduced by Feldstein (1969), Persson (1983), and Benabou (2000) Closely approximates US tax & transfer system (Heathcote, Storesletten, and Violante, 2014) τ indexes progressivity of the tax system
Fit of HSV Tax Function Log of post government income 8.5 9 9.5 10 10.5 11 11.5 12 12.5 13 8.5 9 9.5 10 10.5 11 11.5 12 12.5 13 Log of pre government income Estimated on PSID data for 2000-2006 Households with head / spouse hours 260 per year Estimated value for τ = 0.151, R 2 = 0.96
Social Welfare Want to compare welfare under US tax scheme just described to various alternatives What SWF should we use for this comparison? Our view: current tax system embeds particular social preference for redistribution What taste for redistribution is consistent with observed progressivity (τ = 0.151)?
Social Welfare Assume SWF takes the form W(α) = exp( θα) θ controls taste for redistribution (e.g. θ = 0 : utilitarian) Assume US govt choosing a tax system in HSV class T (y) = y λy 1 τ What value for θ rationalizes observed choice for τ? Empirically-Motivated SWF: θ US that solves τ (θ US ) = τ US
Social Welfare In general, solve for θ US numerically Special case: If F s are normal then { } θ US = (1 τ US ) + 1 1 1 v α 1+σ (1 g US )(1 τ US ) 1 θ is increasing in τ and g θ is decreasing in v α and σ As σ, if θ = θ US, then τ US delivers first best Use θ US as baseline for welfare comparisons focus on relative efficiency of alternative tax systems
Calibration Frisch elasticity = 0.5 σ = 2 Progressivity parameter τ = 0.151 (HSV 2014) Govt spending G s.t. G/Y = 0.188 (US, 2005) var(ε) = 0.193: estimated variance of insurable shocks (HSV 2013) var(α) = 0.273: total variance of wages is 0.466
Wage Distribution Heavy right tail of labor earnings distribution, well-approximated by Pareto distribution (Saez, 2001) Assume Pareto tail reflects uninsurable wage dispersion F α : Exponentially Modified Gaussian EMG(µ, σ 2, a) α is the sum of independent Normal and Exponential random variables F ε : Normal N( vε 2, v ε) log(w) = α + ε is itself EMG w is Pareto-Lognormal log(wh) is also EMG, given our utility function, market structure, and HSV tax system
Distribution for α 0.83439 0.30903 0.11446 0.04239 0.01570 0.00581 0.00215 0.00080 0.00030 0.00011 0.00004 0.00002 0.00001 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 Density (log scale) Data (SCF 2007) NORMAL EMG 9.22 9.50 9.77 10.05 10.32 10.59 10.87 11.14 11.42 11.69 11.97 12.24 12.52 12.79 13.06 13.34 13.61 13.89 14.16 14.44 14.71 14.99 15.26 15.53 15.81 16.08 16.36 16.63 16.91 17.18 17.46 17.73 18.01 Log Labor Income Use micro data from the 2007 SCF to estimate α by maximum likelihood a = 2.2
Numerical Implementation Maintain continuous distribution for ε Assume a discrete distribution for α Baseline: 10,000 evenly-spaced grid points α min : $5 per hour (12% of the average = $41.56) α max : $3,075 per hour ($6.17m assuming 2,000 hours = 99.99th percentile of SCF earnings distn.) Set µ and σ 2 to match E [e α ] = 1 and target for var(α) given a = 2.2
Wage Distribution x 10-3 8 7 6 Density 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Hourly Wages
Social Welfare Function 3 2.5 Relative Pareto Weight 2 1.5 1 0.5 1 2 3 4 5 6 7 8 exp(alpha)
Quantitative Analysis Focus on four cases: 1. HSV tax function: T (y) = y λy 1 τ 2. Affine tax function: T (y) = τ 0 + τ 1 y 3. Cubic tax function: T (y) = τ 0 + τ 1 y + τ 2 y 2 + τ 3 y 3 4. Mirrless tax function (second best allocation)
Quantitative Analysis: Benchmark Tax System Tax Parameters Outcomes welfare Y mar. tax TR/Y HSV US λ 0.836 τ 0.151 0.311 0.018 Affine τ 0 0.116 τ 1 0.303 0.58 0.41 0.303 0.089 Cubic τ 0 0.032 τ 1 0.126 τ 2 0.064 τ 3 0.003 0.05 0.79 0.289 0.017 Mirrlees 0.11 0.82 0.287 0.003
Quantitative Analysis: Benchmark Moving to affine tax system is walfare reducing Increasing marginal rates more important than lump-sum transfers Moving to fully optimal system generates only tiny gains (0.1%) The optimal marginal tax rate is around 30% Almost no need for transfers
HSV Tax Function Log Consumption 6 4 2 0-2 -4 Mirrlees Ramsey Hours Worked 1.2 1 0.8 0.6 0.4-2 0 2 4 alpha -2 0 2 4 alpha Marginal Tax Rate 0.6 0.5 0.4 0.3 0.2 0.1 0-2 0 2 4 alpha Average Tax Rate 1 0.5 0-0.5-1 -2 0 2 4 alpha
Affine Tax Function Log Consumption 6 4 2 0-2 -4 Mirrlees Ramsey Hours Worked 1.2 1 0.8 0.6 0.4-2 0 2 4 alpha -2 0 2 4 alpha Marginal Tax Rate 0.6 0.5 0.4 0.3 0.2 0.1 0-2 0 2 4 alpha Average Tax Rate 1 0.5 0-0.5-1 -2 0 2 4 alpha
Cubic Tax Function Log Consumption 6 4 2 0-2 -4 Mirrlees Ramsey Hours Worked 1.2 1 0.8 0.6 0.4-2 0 2 4 alpha -2 0 2 4 alpha Marginal Tax Rate 0.6 0.5 0.4 0.3 0.2 0.1 0-2 0 2 4 alpha Average Tax Rate 1 0.5 0-0.5-1 -2 0 2 4 alpha
Quantitative Analysis: Sensitivity What drives the results? 1. Empirically-motivated SWF Utilitarian SWF: θ = 0 2. Eliminate insurable shocks: ṽ α = v α + v ε and ṽ ε = 0 3. Wage distribution has thin Log-Normal right tail: α LN
Sensitivity: Utilitarian SWF Utilitarian SWF stronger taste for redistribution Want higher tax rates and larger transfers Optimal HSV still better than optimal affine Tax System Tax Parameters Outcomes welfare Y mar. tax TR/Y HSV US λ : 0.836 τ : 0.151 0.311 0.018 HSV λ : 0.821 τ : 0.295 1.38 6.02 0.436 0.068 Affine τ 0 : 0.233 τ 1 : 0.452 0.45 6.43 0.452 0.220 Mirrlees 1.53 6.15 0.440 0.122
Sensitivity: No Insurable Shocks No insurable shocks larger role for public redistribution Want higher tax rates and larger transfers Optimal HSV still better than optimal affine Tax System Tax Parameters Outcomes welfare Y mar. tax TR/Y HSV US λ : 0.836 τ : 0.151 0.311 0.018 HSV λ : 0.839 τ : 0.192 0.12 1.64 0.346 0.033 Affine τ 0 : 0.156 τ 1 : 0.360 0.21 1.97 0.360 0.139 Mirrlees 0.23 2.11 0.361 0.081 Utilitarian SWF + No insurable shocks Lump-sum transfers more important Optimal HSV worse than optimal affine
Sensitivity: Log-Normal Wage Log-normal distribution thin right tail Optimal HSV worse than optimal affine Optimal affine nearly efficient Tax System Tax Parameters Outcomes welfare Y mar. tax TR/Y HSV US λ : 0.836 τ : 0.151 0.311 0.018 HSV λ : 0.826 τ : 0.070 0.27 3.10 0.239 0.005 Affine τ 0 : 0.068 τ 1 : 0.250 0.34 2.67 0.250 0.042 Mirrlees 0.35 2.71 0.249 0.042
Why Distribution Shape Matters Want high marginal rates at the top when (i) few agents face those marginal rates, but (ii) can capture lots of revenue from higher-income households 1 0.9 Benchmark Log-normal 0.8 Coefficient (1-F(y))/(yf(y)) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 6 7 8 Income y
Efficient Marginal Tax Rates: Sensitivity 0.6 Benchmark Utilitarian No Insurable Shocks Log-Normal 0.5 Marginal Tax Rate 0.4 0.3 0.2 0.1 0-2 -1.5-1 -0.5 0 0.5 1 1.5 2 alpha
Extension: Coarse Grid Coarse grid Mirrlees Planner can do much better Gives Mirrlees planner too much power if true distribution continuous # of grid points Welfare (relative to HSV) Affine Mirrlees First Best 25 0.58 3.82 8.80 100 0.58 1.17 8.79 1,000 0.58 0.21 8.80 10,000 0.58 0.11 8.80 100,000 0.58 0.10 8.80
Extension: Coarse Grid 0.6 0.5 25 grids 10 2 grids 10 3 grids 10 4 grids 10 5 grids Marginal Tax Rate 0.4 0.3 0.2 0.1 0-2 -1 0 1 2 3 4 alpha
Extension: Type-Contingent Taxes Productivity partially reflects observable characteristics (e.g. education, age, gender) Some fraction of uninsurable shocks are observable: α α + κ Heathcote, Perri & Violante (2010) estimate variance of cross-sectional wage dispersion attributable to observables, v κ = 0.108 Planner should condition taxes on observables: T(y; κ) Consider two-point distribution for κ (college vs high school)
Extension: Type-Contingent Taxes Significant welfare gains relative to non-contingent tax Conditioning on observables marginal tax rates of 20% System Outcomes wel. Y mar. TR/Y HSV US λ : 0.827,τ : 0.151 0.311 0.017 0.015 HSV λ L : 0.988, τ L : 0.180 λ H : 0.694, τ H : 0.059 1.34 4.64 0.212 0.043 0.061 Affine τ L 0 : 0.140, τ L 1: 0.151 1.39 5.20 0.199 τ H 0 : 0.095, τ H 1 : 0.224 Mirrlees 1.46 5.20 0.200 0.126 0.137 0.103 0.136
Conclusions Moving from current HSV system to optimal affine system is welfare reducing Increasing marginal rates more important than lump-sum transfers Moving from current HSV system to fully optimal system generates tiny welfare gains Ramsey and Mirrlees tax schemes not far apart: can approximately decentralize SB with a simple tax scheme Important to measure the gap in terms of allocations and welfare, not in terms of marginal tax rates Want to condition both transfers and tax rates on observables