Inverse Gaussian Distribution

Similar documents
Beta Distribution. Paul Johnson and Matt Beverlin June 10, 2013

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Review of Random Variables

Fairfield Public Schools

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Lecture 5 : The Poisson Distribution

Chapter 3 RANDOM VARIATE GENERATION

Descriptive Statistics

4. Continuous Random Variables, the Pareto and Normal Distributions

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

MBA 611 STATISTICS AND QUANTITATIVE METHODS

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Lecture Notes Module 1

CALCULATIONS & STATISTICS

GLM with a Gamma-distributed Dependent Variable

seven Statistical Analysis with Excel chapter OVERVIEW CHAPTER

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

Lesson 4 Measures of Central Tendency

Normal Distribution. Definition A continuous random variable has a normal distribution if its probability density. f ( y ) = 1.

Foundation of Quantitative Data Analysis

1.5 Oneway Analysis of Variance

Negative Binomial. Paul Johnson and Andrea Vieux. June 10, 2013

Introduction to R. I. Using R for Statistical Tables and Plotting Distributions

Module 3: Correlation and Covariance

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Exploratory Data Analysis. Psychology 3256

6 3 The Standard Normal Distribution

Simple linear regression

SKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.

What Does the Normal Distribution Sound Like?

Java Modules for Time Series Analysis

Permutation Tests for Comparing Two Populations

Lecture 2. Summarizing the Sample

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

A logistic approximation to the cumulative normal distribution

Physics Lab Report Guidelines

Exploratory Data Analysis

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Gamma Distribution Fitting

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Frequency Distributions

Modeling Individual Claims for Motor Third Party Liability of Insurance Companies in Albania

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

An Introduction to Statistics using Microsoft Excel. Dan Remenyi George Onofrei Joe English

SUMAN DUVVURU STAT 567 PROJECT REPORT

AP STATISTICS REVIEW (YMS Chapters 1-8)

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Characteristics of Binomial Distributions

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

Descriptive Statistics and Measurement Scales

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Week 4: Standard Error and Confidence Intervals

Big Ideas in Mathematics

Week 3&4: Z tables and the Sampling Distribution of X

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Exploratory data analysis (Chapter 2) Fall 2011

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Statistics. Measurement. Scales of Measurement 7/18/2012

Parametric Technology Corporation. Pro/ENGINEER Wildfire 4.0 Tolerance Analysis Extension Powered by CETOL Technology Reference Guide

TEACHER NOTES MATH NSPIRED

Objective. Materials. TI-73 Calculator

Dongfeng Li. Autumn 2010

Exercise 1.12 (Pg )

8. THE NORMAL DISTRIBUTION

COMMON CORE STATE STANDARDS FOR

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Measures of Central Tendency and Variability: Summarizing your Data for Others

Lecture 8: More Continuous Random Variables

Independent samples t-test. Dr. Tom Pierce Radford University

How to Win the Stock Market Game

Variables Control Charts

How To Understand And Solve A Linear Programming Problem

What is the Probability of Pigging Out

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

Calculator Notes for the TI-Nspire and TI-Nspire CAS

The Circumference Function

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

6.4 Normal Distribution

Summary of important mathematical operations and formulas (from first tutorial):

How To Test For Significance On A Data Set

Bandwidth Selection for Nonparametric Distribution Estimation

Week 1. Exploratory Data Analysis

Probability Calculator

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

The Dummy s Guide to Data Analysis Using SPSS

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Homework 4 - KEY. Jeff Brenion. June 16, Note: Many problems can be solved in more than one way; we present only a single solution here.

Data Exploration Data Visualization

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

Standard Deviation Estimator

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Transcription:

Inverse Gaussian Distribution Paul E. Johnson <pauljohn@ku.edu> June 10, 2013 The Inverse Gaussian distribution is an exponential distribution. It is one of the distributions implemented in R s Generalized Linear Model routines. To my surprise, there are whole books dedicated to this distribution (V. Seshadri, The Inverse Gaussian Distribution: A Case Study in Exponential Families, Oxford University Press, 1994; R. S. Chhikara and J. L. Folks, The Inverse Gaussian Distribution: Theory, Methodology, and Applications, New York: Dekker, 1989). Articles on insurance problems and the stock market often claim that observations follow an Inverse Gaussian distribution. It has one mode in the interior of the range of possible values and it is skewed to the right, sometimes with an extremely long tail. The fact that extremely large outcomes can occur even when almost all outcomes are small is a distinguishing characteristic. 1 Mathematical Description There are Inverse Gaussian distributions in several R packages. Run h e l p. s e a r c h ( " i n v e r s e gaussian " ) to see for yourself. In VGAM, the documentation for inv.gaussianff matches the information in in the package statmod s documentation on dinvgauss. So let s follow that approach. The distribution of x i is described by two characteristics, a mean µ > 0 and precision λ > 0. The probability density function is λ λ(x µ) 2 p(x;µ,λ) = 2πx 3 e 2µ 2 x, 0 < x < If you would like to take λ out of the square root, you can put this down as p(x;µ,λ) = λ2 ( λ(x µ)) 2 e 2µ 2 x, 0 < x < 2πx 3 1

The expected value is µ and the variance of this version of the inverse Gaussian distribution is V ar(x) = µ3 λ The skewness and kurtosis are, respectively, 3 µ λ and 15µ λ In all honesty, I have no intuition whatsoever about what the appearance of this probability model might be. It does not have a kernel smaller than the density itself, so we can t just throw away part on the grounds that it is a normalizing constant or a factor of proportionality. And to make matters worse, the variance depends on the mean. Tidbit: If µ = 1 this is called the Wald distribution. 2 Illustrations The probability density function of a Inverse Gaussian distribution with µ = 1 and λ = 2 is shown in Figure 1. The R code which produces that figure is: library ( statmod ) mu <- 1 lambda <- 2 xrange <- seq ( from =0.0,to =2*mu +5/lambda,by =0.02 ) mainlabel <- expression ( paste ("IG(",mu,",",lambda,")", sep ="")) xprob <- dinvgauss ( xrange, mu = mu, lambda = lambda ) plot ( xrange, xprob, type = " l", main = mainlabel, xlab = " ", ylab = " ") How would one describe Figure 1? 1. Single-peaked 2. Not symmetric tail to the right How does this distribution change in appearance if µ and λ are changed? Let s do some experimentation. The following R code creates an array of figures with 4 rows and 3 columns with various values of µ and λ that is displayed in Figure 2. par ( mfrow=c ( 4, 3 ) ) f o r ( i in 1 : 4 ) { f o r ( j in 1 : 3 ) { mu < 3 i lambda < 20 j 2

Figure 1: Inverse Gaussian Distribution IG(µ,λ) 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 3

} } xrange < seq ( from = 0.0, to = 3 mu, by = 0. 0 2 ) mainlabel< paste ( " IG( ",mu, ", ", lambda, " ) ", sep = " " ) xprob < dinvgauss ( xrange, mu = mu, lambda = lambda ) p l o t ( xrange, xprob, type = " l ", main = mainlabel, xlab = " p o s s i b l e v a l ues o f x ", ylab = " p r o b a b i l i t y o f x " ) I had a very hard time believing that these calculations were correct. The graph shows almost no impact of changing parameters. The mistake I had made was assuming that the Inverse Gamma s tail will be cut off in a way that makes a nice picture. It turns out that, when the λ is small, then there can be extremely huge values observed. I found that by plotting histograms of random samples from various parameter settings. For example, if we draw 1000 observations from Inverse Gamma with µ = 12 and λ = 2, look what happens in Figure 3: Can you describe this variety in a nutshell? Sometimes the IG has a very long tail, stretching far to the right, and it makes the expected value a very poor description of the modal observation. Probably the best illustration I have found for this model is presented in Figure 5. In the Webpages for the old classic program Dataplot (http://www.itl.nist.gov/div898/software/dataplo I found this interesting comment. That page uses the parameter gamma in place of lambda, otherwise the formula is the same. It says, The inverse Gaussian distribution is symmetric and moderate tailed for small gamma. It is highly skewed and long tailed for large gamma. It approaches normality as gamma approaches zero. I can t find any evidence in favor of this characterization in Figure 6. The opposite is more likely true, making me suspect that in the Dataplot author s mind, the parameter gamma (γ) might have at one time been 1/λ. This leads me to caution studentst that if you want to be confident about one of these results, it is not sufficient to just take the word of a randomly chosen Web site. To see the effect of tuning λ up and down, consider figure 6. This shows pretty clearly that if you think of λas a precision parameter and you have high precision, then the observations are likely to be tightly clustered and symmetric about the mean. When you lose precision, as λ gets smaller, then the strong long tail to the right emerges. 3 Why do people want to use this distribution? We want a distribution that can reach up high and admit some extreme values. It is pretty easy to estimate µ and λ by maximum likelihood. An alternative distribution with this general shape is the three parameter Weibull distribution, which is more difficult to estimate (W.E. Bardsley, Note on the Use of the Inverse Gaussian Distribution for Wind Energy Applications, Journal of Applied Meteorology, 19(9): 1126-1130). 4

IG(3,20) Figure 2: Variety of Inverse Gaussian IG(3,40) IG(3,60) 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.0 0.2 0.4 0.6 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 IG(6,20) IG(6,40) IG(6,60) 0.00 0.05 0.10 0.15 0.00 0.10 0.20 0.00 0.10 0.20 0 5 10 15 0 5 10 15 0 5 10 15 IG(9,20) IG(9,40) IG(9,60) 0.00 0.04 0.08 0.00 0.04 0.08 0.12 0.00 0.04 0.08 0.12 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 IG(12,20) IG(12,40) IG(12,60) 0.00 0.04 0.08 0.00 0.04 0.08 0.00 0.04 0.08 0 5 10 20 30 0 5 10 20 30 5 0 5 10 20 30

Figure 3: Sample from Inverse Gaussian with µ = 12 and λ = 2 Density plot, N=1000,mu= 12 lambda=2 Density 0.00 0.05 0.10 0.15 maximum observed x = 340.29 0 50 100 150 200 250 300 350 N = 1000 Bandwidth = 1.15 6

Figure 4: Random Sample From Various Inverse Gaussians mu= 12 lambda=2 0 100 200 300 400 500 600 0 50 100 150 200 250 300 350 xlo mu= 12 lambda=5 0 50 100 150 200 0 50 100 150 200 250 300 350 xmed mu= 12 lambda=20 0 20 40 60 0 50 100 150 200 250 300 350 7 xhi

Figure 5: More Inverse Gaussian Distributions IG(2, various lambda) 0.0 0.2 0.4 0.6 0.8 lambda= 1 lambda= 10 lambda= 50 lambda= 100 0 1 2 3 4 IG(10, various lambda) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 lambda= 1 lambda= 10 lambda= 50 lambda= 100 0 5 10 15 20 8

Figure 6: Shrinking lambda IG(1,50) IG(1,10) 0.0 1.0 2.0 0.0 0.6 1.2 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 IG(1,5) IG(1,1) 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 IG(1,0.5) IG(1,0.001) 0.0 0.5 1.0 1.5 0 1 2 3 4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 9

4 Would you rather have Gamma or Inverse-Gaussian? The Gamma and the Inverse-Gaussian share the property that they are possibly skewed to the right. If you choose the correct parameter values, you can make them practically indistinguisable. However, there is a VERY WEIRD scaling property here. In order for the Inverse Gaussian to produce some extreme large values, it must have higher probability for large values. How can you make sense out of this strange result in Figure 10

Figure 7: Gamma or Inverse Gaussian? x v a l s < seq ( 0, 2 0, l e n g t h. o u t =200) gam < dgamma( xvals, shape =2, s c a l e =1) i g a u s < dinvgauss ( xvals,mu=2,lambda=5) p l o t ( xvals, gam, type=" l ", l t y =1,main=" " ) l i n e s ( xvals, igaus, l t y =2) legend (6,.2, c ( "gamma( sh =2, sc =1) ", " inv gauss (mu=2,lambda=5) " ), l t y =1:2,) gam 0.0 0.1 0.2 0.3 gamma(sh=2,sc=1) inv gauss(mu=2,lambda=5) 0 5 10 15 20 xvals 11

Figure 8: Compare Gamma and Inverse Gaussian par ( mfcol=c ( 3, 1 ) ) f o r ( i in 1 : 3 ) { minx < 20 + 50 ( i 1 ) x v a l s < seq ( minx, 3 0 0, l e n g t h. o u t =1000) gam < dgamma( xvals, shape = 2, s c a l e = 1) i g a u s < dinvgauss ( xvals, mu = 2, lambda = 5) p l o t ( xvals, gam, type=" l ", l t y = 1, main=" " ) l i n e s ( xvals, igaus, l t y = 2) legend (150, 0. 7 max(gam), c ( "gamma( sh =2, sc =1) ", " inv gauss ( mu=2,lambda=5) " ), l t y = 1 : 2, ) } gam 0e+00 3e 08 50 100 150 200 250 300 xvals gamma(sh=2,sc=1) inv gauss(mu=2,lambda=5) gam 0.0e+00 2.0e 29 gamma(sh=2,sc=1) inv gauss(mu=2,lambda=5) 100 150 200 250 300 xvals gam 0e+00 6e 51 gamma(sh=2,sc=1) inv gauss(mu=2,lambda=5) 150 200 250 300 xvals 12

Figure 9: Compare Samples: Gamma and Inverse Gaussian Min. 1 s t Qu. Median Mean 3 rd Qu. Max. 0.0502 0.9770 1.7240 2.0160 2.7520 11.5400 [ 1 ] 2.067561 Min. 1 s t Qu. Median Mean 3 rd Qu. Max. 0.2959 1.1160 1.7240 2.0410 2.5270 10.5000 [ 1 ] 1.831236 Gamma,sh=2,sc=1 0 50 150 0 2 4 6 8 10 12 gam Inv Gaus, mu=2,lambda=4 0 50 150 0 2 4 6 8 10 igaus 13

Figure 10: Compare Samples: Gamma and Inverse Gaussian Min. 1 s t Qu. Median Mean 3 rd Qu. Max. 0.1192 1.9150 3.3560 4.0570 5.3900 22.7900 [ 1 ] 8.55414 Min. 1 s t Qu. Median Mean 3 rd Qu. Max. 0.648 2.062 3.125 3.825 4.802 18.150 [ 1 ] 6.715464 Gamma, sh=2, sc=2 0 100 250 0 5 10 15 20 gam Inv Gauss, mu=4, lambda=8 0 200 400 0 5 10 15 20 igaus 14

Figure 11: Compare Samples: Gamma and Inverse Gaussian Min. 1 s t Qu. Median Mean 3 rd Qu. Max. 0.709 6.868 10.600 11.860 15.730 42.260 [ 1 ] 43.43887 Min. 1 s t Qu. Median Mean 3 rd Qu. Max. 2.472 7.040 10.130 12.220 15.140 62.900 [ 1 ] 60.23548 Gamma, sh=3,sc=4 0 10 30 0 10 20 30 40 gam Inv Gauss, mu=12,lambda=34 0 20 40 0 10 20 30 40 50 60 igaus 15