Inverse Gaussian Distribution Paul E. Johnson <pauljohn@ku.edu> June 10, 2013 The Inverse Gaussian distribution is an exponential distribution. It is one of the distributions implemented in R s Generalized Linear Model routines. To my surprise, there are whole books dedicated to this distribution (V. Seshadri, The Inverse Gaussian Distribution: A Case Study in Exponential Families, Oxford University Press, 1994; R. S. Chhikara and J. L. Folks, The Inverse Gaussian Distribution: Theory, Methodology, and Applications, New York: Dekker, 1989). Articles on insurance problems and the stock market often claim that observations follow an Inverse Gaussian distribution. It has one mode in the interior of the range of possible values and it is skewed to the right, sometimes with an extremely long tail. The fact that extremely large outcomes can occur even when almost all outcomes are small is a distinguishing characteristic. 1 Mathematical Description There are Inverse Gaussian distributions in several R packages. Run h e l p. s e a r c h ( " i n v e r s e gaussian " ) to see for yourself. In VGAM, the documentation for inv.gaussianff matches the information in in the package statmod s documentation on dinvgauss. So let s follow that approach. The distribution of x i is described by two characteristics, a mean µ > 0 and precision λ > 0. The probability density function is λ λ(x µ) 2 p(x;µ,λ) = 2πx 3 e 2µ 2 x, 0 < x < If you would like to take λ out of the square root, you can put this down as p(x;µ,λ) = λ2 ( λ(x µ)) 2 e 2µ 2 x, 0 < x < 2πx 3 1
The expected value is µ and the variance of this version of the inverse Gaussian distribution is V ar(x) = µ3 λ The skewness and kurtosis are, respectively, 3 µ λ and 15µ λ In all honesty, I have no intuition whatsoever about what the appearance of this probability model might be. It does not have a kernel smaller than the density itself, so we can t just throw away part on the grounds that it is a normalizing constant or a factor of proportionality. And to make matters worse, the variance depends on the mean. Tidbit: If µ = 1 this is called the Wald distribution. 2 Illustrations The probability density function of a Inverse Gaussian distribution with µ = 1 and λ = 2 is shown in Figure 1. The R code which produces that figure is: library ( statmod ) mu <- 1 lambda <- 2 xrange <- seq ( from =0.0,to =2*mu +5/lambda,by =0.02 ) mainlabel <- expression ( paste ("IG(",mu,",",lambda,")", sep ="")) xprob <- dinvgauss ( xrange, mu = mu, lambda = lambda ) plot ( xrange, xprob, type = " l", main = mainlabel, xlab = " ", ylab = " ") How would one describe Figure 1? 1. Single-peaked 2. Not symmetric tail to the right How does this distribution change in appearance if µ and λ are changed? Let s do some experimentation. The following R code creates an array of figures with 4 rows and 3 columns with various values of µ and λ that is displayed in Figure 2. par ( mfrow=c ( 4, 3 ) ) f o r ( i in 1 : 4 ) { f o r ( j in 1 : 3 ) { mu < 3 i lambda < 20 j 2
Figure 1: Inverse Gaussian Distribution IG(µ,λ) 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 3
} } xrange < seq ( from = 0.0, to = 3 mu, by = 0. 0 2 ) mainlabel< paste ( " IG( ",mu, ", ", lambda, " ) ", sep = " " ) xprob < dinvgauss ( xrange, mu = mu, lambda = lambda ) p l o t ( xrange, xprob, type = " l ", main = mainlabel, xlab = " p o s s i b l e v a l ues o f x ", ylab = " p r o b a b i l i t y o f x " ) I had a very hard time believing that these calculations were correct. The graph shows almost no impact of changing parameters. The mistake I had made was assuming that the Inverse Gamma s tail will be cut off in a way that makes a nice picture. It turns out that, when the λ is small, then there can be extremely huge values observed. I found that by plotting histograms of random samples from various parameter settings. For example, if we draw 1000 observations from Inverse Gamma with µ = 12 and λ = 2, look what happens in Figure 3: Can you describe this variety in a nutshell? Sometimes the IG has a very long tail, stretching far to the right, and it makes the expected value a very poor description of the modal observation. Probably the best illustration I have found for this model is presented in Figure 5. In the Webpages for the old classic program Dataplot (http://www.itl.nist.gov/div898/software/dataplo I found this interesting comment. That page uses the parameter gamma in place of lambda, otherwise the formula is the same. It says, The inverse Gaussian distribution is symmetric and moderate tailed for small gamma. It is highly skewed and long tailed for large gamma. It approaches normality as gamma approaches zero. I can t find any evidence in favor of this characterization in Figure 6. The opposite is more likely true, making me suspect that in the Dataplot author s mind, the parameter gamma (γ) might have at one time been 1/λ. This leads me to caution studentst that if you want to be confident about one of these results, it is not sufficient to just take the word of a randomly chosen Web site. To see the effect of tuning λ up and down, consider figure 6. This shows pretty clearly that if you think of λas a precision parameter and you have high precision, then the observations are likely to be tightly clustered and symmetric about the mean. When you lose precision, as λ gets smaller, then the strong long tail to the right emerges. 3 Why do people want to use this distribution? We want a distribution that can reach up high and admit some extreme values. It is pretty easy to estimate µ and λ by maximum likelihood. An alternative distribution with this general shape is the three parameter Weibull distribution, which is more difficult to estimate (W.E. Bardsley, Note on the Use of the Inverse Gaussian Distribution for Wind Energy Applications, Journal of Applied Meteorology, 19(9): 1126-1130). 4
IG(3,20) Figure 2: Variety of Inverse Gaussian IG(3,40) IG(3,60) 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.0 0.2 0.4 0.6 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 IG(6,20) IG(6,40) IG(6,60) 0.00 0.05 0.10 0.15 0.00 0.10 0.20 0.00 0.10 0.20 0 5 10 15 0 5 10 15 0 5 10 15 IG(9,20) IG(9,40) IG(9,60) 0.00 0.04 0.08 0.00 0.04 0.08 0.12 0.00 0.04 0.08 0.12 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 IG(12,20) IG(12,40) IG(12,60) 0.00 0.04 0.08 0.00 0.04 0.08 0.00 0.04 0.08 0 5 10 20 30 0 5 10 20 30 5 0 5 10 20 30
Figure 3: Sample from Inverse Gaussian with µ = 12 and λ = 2 Density plot, N=1000,mu= 12 lambda=2 Density 0.00 0.05 0.10 0.15 maximum observed x = 340.29 0 50 100 150 200 250 300 350 N = 1000 Bandwidth = 1.15 6
Figure 4: Random Sample From Various Inverse Gaussians mu= 12 lambda=2 0 100 200 300 400 500 600 0 50 100 150 200 250 300 350 xlo mu= 12 lambda=5 0 50 100 150 200 0 50 100 150 200 250 300 350 xmed mu= 12 lambda=20 0 20 40 60 0 50 100 150 200 250 300 350 7 xhi
Figure 5: More Inverse Gaussian Distributions IG(2, various lambda) 0.0 0.2 0.4 0.6 0.8 lambda= 1 lambda= 10 lambda= 50 lambda= 100 0 1 2 3 4 IG(10, various lambda) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 lambda= 1 lambda= 10 lambda= 50 lambda= 100 0 5 10 15 20 8
Figure 6: Shrinking lambda IG(1,50) IG(1,10) 0.0 1.0 2.0 0.0 0.6 1.2 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 IG(1,5) IG(1,1) 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 IG(1,0.5) IG(1,0.001) 0.0 0.5 1.0 1.5 0 1 2 3 4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 9
4 Would you rather have Gamma or Inverse-Gaussian? The Gamma and the Inverse-Gaussian share the property that they are possibly skewed to the right. If you choose the correct parameter values, you can make them practically indistinguisable. However, there is a VERY WEIRD scaling property here. In order for the Inverse Gaussian to produce some extreme large values, it must have higher probability for large values. How can you make sense out of this strange result in Figure 10
Figure 7: Gamma or Inverse Gaussian? x v a l s < seq ( 0, 2 0, l e n g t h. o u t =200) gam < dgamma( xvals, shape =2, s c a l e =1) i g a u s < dinvgauss ( xvals,mu=2,lambda=5) p l o t ( xvals, gam, type=" l ", l t y =1,main=" " ) l i n e s ( xvals, igaus, l t y =2) legend (6,.2, c ( "gamma( sh =2, sc =1) ", " inv gauss (mu=2,lambda=5) " ), l t y =1:2,) gam 0.0 0.1 0.2 0.3 gamma(sh=2,sc=1) inv gauss(mu=2,lambda=5) 0 5 10 15 20 xvals 11
Figure 8: Compare Gamma and Inverse Gaussian par ( mfcol=c ( 3, 1 ) ) f o r ( i in 1 : 3 ) { minx < 20 + 50 ( i 1 ) x v a l s < seq ( minx, 3 0 0, l e n g t h. o u t =1000) gam < dgamma( xvals, shape = 2, s c a l e = 1) i g a u s < dinvgauss ( xvals, mu = 2, lambda = 5) p l o t ( xvals, gam, type=" l ", l t y = 1, main=" " ) l i n e s ( xvals, igaus, l t y = 2) legend (150, 0. 7 max(gam), c ( "gamma( sh =2, sc =1) ", " inv gauss ( mu=2,lambda=5) " ), l t y = 1 : 2, ) } gam 0e+00 3e 08 50 100 150 200 250 300 xvals gamma(sh=2,sc=1) inv gauss(mu=2,lambda=5) gam 0.0e+00 2.0e 29 gamma(sh=2,sc=1) inv gauss(mu=2,lambda=5) 100 150 200 250 300 xvals gam 0e+00 6e 51 gamma(sh=2,sc=1) inv gauss(mu=2,lambda=5) 150 200 250 300 xvals 12
Figure 9: Compare Samples: Gamma and Inverse Gaussian Min. 1 s t Qu. Median Mean 3 rd Qu. Max. 0.0502 0.9770 1.7240 2.0160 2.7520 11.5400 [ 1 ] 2.067561 Min. 1 s t Qu. Median Mean 3 rd Qu. Max. 0.2959 1.1160 1.7240 2.0410 2.5270 10.5000 [ 1 ] 1.831236 Gamma,sh=2,sc=1 0 50 150 0 2 4 6 8 10 12 gam Inv Gaus, mu=2,lambda=4 0 50 150 0 2 4 6 8 10 igaus 13
Figure 10: Compare Samples: Gamma and Inverse Gaussian Min. 1 s t Qu. Median Mean 3 rd Qu. Max. 0.1192 1.9150 3.3560 4.0570 5.3900 22.7900 [ 1 ] 8.55414 Min. 1 s t Qu. Median Mean 3 rd Qu. Max. 0.648 2.062 3.125 3.825 4.802 18.150 [ 1 ] 6.715464 Gamma, sh=2, sc=2 0 100 250 0 5 10 15 20 gam Inv Gauss, mu=4, lambda=8 0 200 400 0 5 10 15 20 igaus 14
Figure 11: Compare Samples: Gamma and Inverse Gaussian Min. 1 s t Qu. Median Mean 3 rd Qu. Max. 0.709 6.868 10.600 11.860 15.730 42.260 [ 1 ] 43.43887 Min. 1 s t Qu. Median Mean 3 rd Qu. Max. 2.472 7.040 10.130 12.220 15.140 62.900 [ 1 ] 60.23548 Gamma, sh=3,sc=4 0 10 30 0 10 20 30 40 gam Inv Gauss, mu=12,lambda=34 0 20 40 0 10 20 30 40 50 60 igaus 15