Tilted Normal Distribution and Its Survival Properties

Transcription

1 Journal of Data Science 10(2012), Tilted Normal Distribution and Its Survival Properties Sudhansu S. Maiti 1 and Mithu Dey 2 1 Visva-Bharati University and 2 Asansol Engineering College Abstract: To analyze skewed data, skew normal distribution is proposed by Azzalini (1985). For practical problems of estimating the skewness parameter of this distribution, Gupta and Gupta (2008) suggested power normal distribution as an alternative. We search for another alternative, named tilted normal distribution following the approach of Marshall and Olkin (1997) to add a positive parameter to a general survival function and taking survival function is of normal form. We have found out different properties of this distribution. Maximum likelihood estimate of parameters of this distribution have been found out. Comparison of tilted normal distribution with skew normal and power normal distribution have been made. Key words: Akaike information criterion, compound distribution, failure rate, Kolmogorov discrepancy measure, maximum likelihood. 1. Introduction The skew normal distribution, proposed by Azzalani (1985), can be a suitable model for the analysis of data exhibiting a unimodal density having some skewness present, a structure often occurring in data analysis. The proposed distribution is a generalization of the standard normal distribution and the probability density function (pdf) is given by φ(z; λ) = 2φ(z)Φ(λz), < z <, (1.1) where φ(x) and Φ(x) denote the N(0, 1) density and distribution function respectively. The parameter λ regulates the skewness and λ = 0 corresponds to the standard normal case. The density given by (1.1) enjoys a number of formal properties which resemble those of the normal distribution, for example, if X has the pdf, given by (1.1), then X 2 has a chi-square distribution with one degree of freedom. That is, all even moments of X are exactly the same as the Corresponding author.

2 226 Sudhansu S. Maiti and Mithu Dey corresponding even moments of the standard normal distribution. For more information, see Azzalini and Capitanio (1999) and Genton (2004). A motivation of the skew normal distribution has been elegantly exhibited by Arnold et al. (1993). This model can naturally arise in applications as hidden function and/or selective reporting model, see Arnold and Beaver (2002). Gupta and Gupta (2008) observed that the estimation of the skewness parameter of model (1.1) is problematic when the sample size is not large enough. Monti (2003) noted that the estimate is λ = ±, even when the data are generated by a model with finite λ. In the case of a skew normal model the moment estimator of λ is the solution of 2 λ π = X. (1.2) 1 + λ 2 Therefore, the solution exists if and only if X < 2/π. The exact distribution of X is not known and it is not easy to establish. Gupta and Gupta (2008) estimated the value of P( X > 2/π) based on simulation data for different values of n and λ. They have shown that as the sample size increases, the probability of the feasibility of the moment estimator increases. On the other hand, as the skewness parameter increases, the chances of obtaining the moment estimator decreases. To find out the maximum likelihood estimator(mle) of λ based on a random sample X 1, X 2,, X n, the likelihood equation is n i=0 X i φ(λx i ) Φ(λX i ) = 0. (1.3) It is clear that if all X i have the same sign then there is no solution of (1.3), see also Liseo (1990) and Azzalini and Capitanio (1999). The probability that X > 0, for λ > 0, is an increasing function of λ. When the absolute value of the skewness parameter is large, the probability that all the observations have the same sign is large. Therefore, for small to moderate sample sizes, the MLE may not be accurate enough for a practical use. The Centered Parametrization (CP) can solve these difficulties to some extent. Azzalini (1985) reparametrized the problem by writing X = µ + Z 0, where Z 0 = (Z µ z )/ z, z = (1 µ 2 z) 1 2, µ z = 2/π (λ/ 1 + λ 2 ) and considering the centred parameters CP= (µ,, γ 1 ) instead of the direct parameters DP= (µ,, λ). Here γ 1 is the usual univariate index of skewness, taken with same sign as that of λ [see for details, Azzalini and Capitanio (1999)]. The set of S-PLUS routines developed for the computation are freely available at the following World Wide Web address:

3 Tilted Normal Distribution and Its Survival Properties 227 For the above reason, Gupta and Gupta (2008) considered another skewed model for which normal distribution is a special case. They proposed the power normal model whose distribution function and density function are given by and F 2 (x; α) = [Φ(x)] α, < x <, α > 0, (1.4) f 2 (x; α) = α[φ(x)] α 1 φ(x), < x <, α > 0. (1.5) This is basically an exponentiated family of distributions with baseline distribution as normal. This is nothing but a system of adding one or more parameters to the existing normal distribution. Such an addition of parameters makes the resulting distribution richer and more flexible for modeling data. The power normal distribution has several nice properties. For α = 1 it reduces to the standard normal distribution. This distribution is a unimodal density which is skewed to the right if α > 1 and to the left if 0 < α < 1. Gupta and Gupta (2008) considered α as skewness parameter of the distribution. But for 0 < α < 1, skewness to the left is not so clear as is to be seen from the Figure 1 and modeling of left skewed data set will be misfit. This motivates us to think for another skewed distribution for which normal distribution is a specified case and at the same time it will be fit for modeling both kind of skewed data. We take Marshall and Olkin (1997) extended model to add a positive parameter to a general survival function. f 2 (x;α) α=1 α=0.5 α=0.1 α= x Figure 1: Plot of power normal distribution for different values of α

4 228 Sudhansu S. Maiti and Mithu Dey Let X be arandom variable with survival function and the probability density function as F 3 (x; β) = βf 0 (x), < x <, β > 0, (1.6) 1 (1 β)f 0 (x) and f 3 (x; β) = βf 0 (x), < x <, β > 0, (1.7) [1 (1 β)f 0 (x)] 2 respectively. Here F 0 (x) is the baseline distribution function and f 0 (x) is the corresponding probability density function. Taking F 0 (x) = Φ(x) and f 0 (x) = φ(x), we get the tilted normal (standard) distribution as f 3 (x; β) = βφ(x), < x <, β > 0. (1.8) [1 (1 β){1 Φ(x)}] 2 Here β = 1 indicates a standard normal density function. The tilted normal density is a unimodal density which is skewed to the left if β > 1 and to the right if 0 < β < 1. Figure 2 displays a few pdf graphs for various values of β. f 3 (x;β) β=1 β=0.5 β=0.01 β= x Figure 2: Plot of tilted normal distribution for different values of β

5 f 3 (x; β, µ, ) = Tilted Normal Distribution and Its Survival Properties 229 If the normal density has mean µ and variance 2, then the form of the tilted normal distribution will be β φ ( x µ ) [1 (1 β){1 Φ ( x µ ), < x, µ <, β, > 0. (1.9) }] 2 We will denote this tilted normal distribution with parameters µ, and β as TN(µ,, β). The paper is organized as follows. In Section 2, we have shown that the TN distribution can be obtained as a compound distribution with mixing exponential distribution. The basic structural properties of the proposed model including the reliability properties are presented in Section 3. In Section 4, we discuss the estimation of parameters of the TN distribution. Section 5 is devoted for studying closeness of skew normal, power normal and tilted normal distributions. A data set has been analyzed using the TN model in Section 6. Section 7 concludes. 2. Compounding Let Ḡ(x θ), < x, θ <, be a survival function (SF) of a random variable X given θ. Let θ be a random variable with pdf m(θ). Then a distribution with survival function Ḡ(x) = Ḡ(x θ)m(θ)dθ, < x <, is called a compound distribution with mixing density m(θ). Compound distributions provide a tool to obtain new parametric families of distributions from existing ones. They represent heterogeneous models where populations individuals have different risks. The following theorem show that the TN distribution can be obtained as a compound distribution. Theorem 2.1 Suppose that the conditional SF of a continuous random variable X given Θ = θ is given by [ Φ ( x µ ) ] Ḡ(x θ) = exp θ 1 Φ ( x µ ), < x, µ <, θ, > 0. Let θ have an exponential distribution with pdf m(θ) = βe βθ, θ, β > 0. Then the compound distribution of X is the TN(µ,, β) distribution.

6 230 Sudhansu S. Maiti and Mithu Dey Proof: For < x <, < µ <, β > 0, the SF of X is given by Ḡ(x) = = β = Ḡ(x θ)m(θ)dθ 0 Φ( x µ e θ ) 1 Φ( x µ 0 β [ 1 Φ ( x µ ) e βθ dθ )] 1 (1 β) [ 1 Φ ( x µ which is the SF of the TN(µ,, β) distribution. For fixed θ > 0, the distribution with survival function Ḡ(x θ) gives a class of non-standard distributions. As a result of Theorem 2.1 above, compounding any of the distributions belonging this class with an exponential mixing distribution for θ will lead to a certain form of TN distribution. This theorem also provides another way to interpret the TN distribution in addition to the geometric-extreme stability property given by Marshall and Olkin (1997). 3. Basic Structural and Survival Properties The expressions for the mean, variance and skewness of the TN(µ,, β) model seem to be not available in compact form. For the proposed model, we have, β φ ( x µ ) E (X r ) = x r )], [1 (1 β){1 Φ ( x µ ) }] 2 dx. Using this expression, with the help of computer, we can find measures of central tendency, dispersion and skewness index of the TN(µ,, β). We have presented measures of central tendency (mean, median, mode), and variance and skewness index, β 1 = µ 3 /µ (where µ 2 and µ 3 are second and third order central moments) for the TN(0, 1, β) in Figure 3 and Figure 4 respectively. The median M of the TN(0, 1, β) distribution is given by ( ) β M = Φ 1, 1 + β which is the β/(1 + β)th quantile of standard normal distribution. The mode is a solution of the equation 2(1 β)φ(x) x = 1 (1 β)[1 Φ(x)]. From Figure 3 and Figure 4, it is found that, as expected, median is less than mean for β < 1. Variance is increasing for β < 1 and is decreasing for β > 1, and skewness index is lying between -1 and 1 for the TN(0, 1, β) variate.

7 Tilted Normal Distribution and Its Survival Properties 231 value mean median mode β Figure 3: Plot of mean, median and mode of tilted normal distribution for different values of β value variance skewness β Figure 4: Plot of variance and skewness index of tilted normal distribution for different values of β

8 232 Sudhansu S. Maiti and Mithu Dey The closed form expressions for basic structural properties are not available. Therefore, we now concentrate to get survival properties of the TN distribution. The failure rate function of X is given by h T N (x) = with the integrated failure rate function = H T N (x) = ln 1 1 (1 β)[1 Φ(x)] φ(x) 1 Φ(x) 1 1 (1 β)[1 Φ(x)] h N(x), β [1 Φ(x)] 1 (1 β) [1 Φ(x)]. The reversed failure rate function of X is given by r T N (x) = = β 1 (1 β)[1 Φ(x)] φ(x) Φ(x) β 1 (1 β)[1 Φ(x)] r N(x). Here h N (x) and r N (x) are failure rate and reversed failure rate function of the N(0, 1) distribution respectively. Figure 5 and Figure 6 show failure rates and reversed failure rates of the tilted normal distribution for different values of β respectively. h TN (x) β=1 β=2 β=3 β= x Figure 5: Plot of failure rate of tilted normal distribution for different values of β

9 Tilted Normal Distribution and Its Survival Properties 233 r TN (x) β=1 β=2 β=3 β= x Figure 6: Plot of reversed failure rate of tilted normal distribution for different values of β Theorem 3.1 If X T N(0, 1, β 1 ), Y T N(0, 1, β 2 ), and β 1 > β 2, then Y < LR X Y < HAZ X Y < ST X and Y < HAZ X Y < MRL X. Here Y < W X means Y is smaller than X in W ordering. Proof: The logarithm of the likelihood ratio v(x) = ln f 3(x, β 1 ) f 3 (x, β 2 ) = ln β 1 β [ln{1 (1 β 2 )(1 Φ(x))} ln{1 (1 β 1 )(1 Φ(x))}] is an increasing function of x if β 1 > β 2, since v (x) = 2φ(x)(β 1 β 2 ) [1 (1 β 1 )(1 Φ(x))][1 (1 β 2 )(1 Φ(x))] > 0, for all x. Therefore, the tilted normal distribution has the likelihood ratio ordering, which implies it has the failure rate ordering as well as the stochastic ordering and the mean residual life ordering, see Gupta and Kirmani (1987). 4. Maximum Likelihood Estimation Let X i, i = 1, 2,, n, be the ith observation and let Y i be the corresponding censoring point. The X i s and Y i s are assumed independent. The X i s are assumed to have the TN(µ,, β) distribution and the Y i s are assumed to have a

10 234 Sudhansu S. Maiti and Mithu Dey non-informative distribution i.e. a distribution that does not involve the parameters µ, and β. One observes only the pair (T i, δ i ), where T i = min(x i, Y i ) and δ i = I(X i Y i ) is the censoring indicator. Given the data {(t i, δ i ), i = 1, 2,, n}, the likelihood function is given by L n (µ,, β) = = n {f 3 (t i ; µ,, β)} δ i {1 F 3 (t i ; µ,, β)} 1 δ i n [ 1 φ ( t i µ ) 1 Φ ( t i µ ) [ β { 1 Φ ( t i µ 1 (1 β) { 1 Φ ( t i µ 1 1 (1 β) { 1 Φ ( t i µ )} )} ] Maximizing the likelihood function is equivalent to maximizing its logarithm. Hence the log-likelihood l(µ,, β) = ln L n (µ,, β) is given by n n ( ) ti µ l(µ,, β) = ln δ i + n ln β + δ i ln φ n { ( )} ti µ + (1 δ i ) ln 1 Φ n [ { ( )}] ti µ (1 + δ i ) ln 1 (1 β) 1 Φ. Maximum likelihood estimates (MLEs) of the parameters β, µ, are obtained by solving the non-linear equations l/ β = 0, l/ µ = 0, l/ = 0. Hence, the iterative solutions are n β = [ { )} ], µ = and [ 1 n δ i n δ i t i + n )} ( ti µ (1+δ i ) 1 Φ { ( )} ti µ 1 (1 β) 1 Φ n (1 δ i )φ ( t i µ ) 1 Φ ( t i µ ) +. ] δi n (1 + δ i )(1 β)φ ( t i µ ) ] 1 (1 β) { 1 Φ ( t i µ )}, [ n 2 1 n = n δ δ i (t i µ) 2 (1 δ i )(t i µ)φ ( t i µ ) + i 1 Φ ( t i µ ) n (1 + δ i )(t i µ)(1 β)φ ( t i µ )] + 1 (1 β) { 1 Φ ( t i µ )},

11 Tilted Normal Distribution and Its Survival Properties 235 respectively. For the complete sample, δ i = 1, for all i, and for the type-i censored sample, Y i = t 0, for all i. The large-sample variance-covariance matrix of the MLEs is obtained by inverting the matrix of minus second order derivatives of the log-likelihood evaluated at the MLEs. The likelihood ratio test will be used to test the null hypothesis H 0 : β = 1 (Normal distribution). When H 0 is true, the deviance test statistic d n = 2[l(ˆµ, ˆ, 1) l(ˆµ, ˆ, ˆβ)], where ˆµ, ˆ are the MLEs of µ, under H 0 : β = 1, has approximately a chi-square distribution with 1 degree of freedom. In addition, for model selection, we use the Akaike Information Criterion (AIC) defined as: AIC = -2 log-likelihood + 2k, where k is the number of parameters in the model and n is the sample size. For more details about the AIC, see Akaike (1969). The model with minimum AIC is the one that better fits the data. 5. Closeness of Skew Normal, Power Normal and Tilted Normal Distribution It should be noted that the cumulative distribution function of the skew normal, power normal and tilted normal distributions are and F 1 (x, λ) = 2 x F 2 (x, α) = Φ α (x), α > 0, F 3 (x, β) = Φ(λt)φ(t)dt, < λ <, Φ(x) 1 (1 β){1 Φ(x)}, β > 0, respectively. F 1 (x, λ), F 2 (x, α) and F 3 (x, β) are exactly equal to normal distribution when λ = 0, α = 1 and β = 1. F 1 (x, λ) and F 2 (x, α) are also exactly equal when λ = 1 and α = 2. Since all the models can be used to model positively or negatively skewed data, it might be possible to approximate F 1 (x, λ) by F 2 (x, α) or F 3 (x, β). To investigate this, we apply the following procedure: 1. Set a value of λ. { 2. Find a and b such that a = max x : 2 } x Φ(λt)φ(t)dt and { b = min x : 2 x }. Φ(λt)φ(t)dt 0.999

12 236 Sudhansu S. Maiti and Mithu Dey 3. Divide the range [a, b] by 100 parts, say, a = x 0 x 1 x 100 = b, and set y i = F 1 (x i, λ); i = 0, 1, 2,, Find α by minimizing S 2 1 = 100 [y i F 2 (x i, α)] 2, to fit the model y F 2 (x, α). 5. Find the Kolmogorov discrepancy measure D 12 = max xi F 2 (x i, ˆα) F 1 (x i, λ). 6. Find β by minimizing S 2 2 = 100 [y i F 3 (x i, β)] 2, to fit the model y F 3 (x, β). 7. Find the Kolmogorov discrepancy measure D 13 = max xi F 3 (x i, ˆβ) F 1 (x i, λ). In Table 1, we represent the values of ˆα and D 12 corresponding to various values of λ, and the values of β and D 13 corresponding to various values of α (assuming ˆα as α in this case). Table 1: Kolmogorov discrepancy measure for various values of λ and corresponding values of α and β λ ˆα D 12 ˆβ D From the table it is clear that the discrepancy among the skew normal, power normal and tilted normal is minimal when λ = 0, α = 1 and β = 1 and the three models coincide with the standard normal distribution and there is, obviously, no discrepancy among the three. The skew normal and power normal distributions are also identical when λ = 1 and α = 2. The power normal distribution provides a better approximation to the skew normal when λ is positive as compared to the negative values of λ and the Kolmogorov discrepancy measure does not exhibit symmetric behaviour for values of λ moving away from zero in both directions. On the other hand, the tilted normal distribution approximates with equal efficiency for values of λ moving away from zero in both directions as the

13 Tilted Normal Distribution and Its Survival Properties 237 Kolmogorov discrepancy measure exhibits symmetric behaviour. It provides a better approximation than the power normal to the skew normal when λ is negative. The relative Kolmogorov discrepancy of tilted normal and power normal is less for the positive values of λ as compared to that for the negative values of λ. 6. Data Analysis In this section, we consider a data set to fit with the tilted normal distribution. The data related to failure time of bus motor is taken from Davis (1952). We consider initial bus motor failures data of Davis (1952, p. 145). Data contain distance intervals (in thousands of miles) and corresponding observed number of failures and have been shown in Table 2. Histogram shows that the data set is negatively (left-) skewed with coefficient of skewness, b 1 = m 3 / m 3 2 = (where m 2 and m 3 are sample second and third order central moments). Davis fitted this data to the normal distribution with observed χ 2 = 9.93 having p-value We have fitted this data with the normal, the skew normal, the power normal and the tilted normal distributions. The results are summarized in Table 3. Histogram and fitted tilted normal, skew normal, power normal and normal curves to data have been shown in Figure 7. Table 2: Initial bus motor failures Distance Interval, Thousands of Miles Observed Number of Failures Less than up 2 Total 191 Table 3: Summarized results of fitting different distributions to data set of Davis (1952) Distribution ˆµ ˆ Estimate of tilt parameter log-likelihood AIC Normal Power normal ˆα = Skew normal ˆλ = Tilted normal ˆβ =

14 238 Sudhansu S. Maiti and Mithu Dey Tilted normal Skew normal Power normal Normal Figure 7: Plot of histogram and fitted tilted normal, skew normal, power normal and normal curves to data of Davis (1952) Here for tilted normal fitting, the deviance test statistic d n = with p- value These information justify better fitting of the tilted normal distribution to the data. Since the data are related to failure times, we have presented the estimated failure rate functions for the normal, the power normal, the skew normal and the tilted normal model in Figure 8. The estimated failure rate shows an increasing pattern. x r(x) Tilted Normal Skew Normal Power Normal Normal x Figure 8: Plot of estimated failure rate functions of fitted tilted normal, skew normal, power normal and normal curves to data of Davis (1952)

15 7. Concluding Remarks Tilted Normal Distribution and Its Survival Properties 239 In this article, the tilted normal distribution has been studied which could be an alternative model to skew normal distribution for left skewed data. The practical problems in estimating the skewness parameter for small to moderate sample sizes of skew normal distribution has been discussed and to alleviate these problems, an alternative model called the Power normal model has been proposed by Gupta and Gupta (2008). It has been observed that the Power normal model is not a good fit for data skewed to the left. As an alternative this tilted normal distribution has been proposed. The normal distribution is widely and comfortably used by the practitioners as well as the theoreticians for analyzing symmetric data. Since skewed data could be analyzed by the tilted normal distribution that is obtained by adding a parameter, it is to be easy and comprehensive to the users. The structural and survival properties of this distribution have been studied and inference on parameters have also been mentioned. The appropriateness of fitting the tilted normal distribution has been established by analyzing a data set. Acknowledgements The authors would like to thank the referee for a very careful reading of the manuscript and making a number of nice suggestions which improved the earlier version of the manuscript. Authors are also thankful to Dr. M. M. Panja, Department of Mathematics, Visva-Bharati University for computational help. References Akaike, H. (1969). Fitting autoregressive models for prediction. Annals of the Institute of Statistical Mathematics 21, Arnold, B. C. and Beaver, R. J. (2002). Skewed multivariate models related to hidden truncation and/or selective reporting. Test 11, Arnold, B. C., Beaver, R. J., Groeneveld, R. A. and Meeker, W. Q. (1993). The nontruncated marginal of a truncated bivariate normal distribution. Psychometrika 58, Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 12, Azzalini, A. and Capitanio, A. (1999). Statistical applications of the multivariate skew-normal distribution. Journal of the Royal Statistical Society, Series B 61,

16 240 Sudhansu S. Maiti and Mithu Dey Davis, D. J. (1952). An analysis of some failure data. Journal of the American Statistical Association 47, Genton, M. G. (2004). Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality. Chapman and Hall, New York. Gupta, R. C. (2006). Variance residual life function in reliability studies. Metron - International Journal of Statistics LXIV, Gupta, R. C. and Kirmani, S. N. U. A. (1987). On the order relations between reliability measures of two distributions. Stochastic Models 3, Gupta, R. D. and Gupta, R. C. (2008). Analyzing skewed data by power normal model. Test 17, Liseo, B. (1990). La classe delle densità normali sghembe: aspetti inferenziali da un punto di vista bayesiano (in Italian). Statistica L, Marshall, A. W. and Olkin, I. (1997). A new method for adding a parameter to a family of distributions with application to the exponential and Weibull families. Biometrika 84, Monti, A. C. (2003). A note on the estimation of the skew normal and the skew exponential power distributions. Metron - International Journal of Statistics LXI, Received July 18, 2011; accepted October 23, Sudhansu S. Maiti Department of Statistics Visva-Bharati University Santiniketan , India dssm1@rediffmail.com Mithu Dey Department of Basic Sciences (Mathematics) Asansol Engineering College Vivekananda Sarani, Kanyapur , India mithu.dey@gmail.com