Generating Random Samples from the Generalized Pareto Mixture Model

Transcription

1 Generating Random Samples from the Generalized Pareto Mixture Model MUSTAFA ÇAVUŞ AHMET SEZER BERNA YAZICI Department of Statistics Anadolu University Eskişehir TURKEY Abstract: The Generalized Pareto Distribution is a very useful tool for modeling in many areas of economics, finance and insurance. The Generalized Pareto Distribution is commonly used for extreme value problems. Especially, the values which exceed the finite threshold, is the focus in extreme value problems like in insurance sector. The Generalized Pareto Distribution is well approach for modeling the samples which include these extreme values. Sometimes, intended samples might have a heterogeneous distribution. In such cases, the mixture models are better way for modeling the data.in this study, we generate random samples from the Generalized Pareto Mixture Distribution for modeling of heterogeneous data. For this purpose, we use two three-parameters Generalized Pareto Distribution as components of the Generalized Pareto Mixture Distribution. For generating random samples, The Inverse Transformation Method is used in simulation study. The parameters of the mixture models are shape, scale and location are fixed. After generating random samples, Chi-Square Goodness-of-Fit Test is used for checking whether the generated samples are distributed based on The Generalized Pareto Distribution. Then, appropriate samples are combined for the determined mixture model with mixture weights. In simulation study, R-Statistical Programming Language is used. Key words: The Generalized Pareto Mixture Distribution, Mixture Models, The Inverse Transformation Method, Chi-Square Goodness-of-Fit Test, Generating Random Samples, Pareto Distribution ISBN:

2 1.Introduction The Generalized Pareto Distribution (GPD) was introduced by Pikands (1975).[1]Then further studied was worked by Davison, Smith (1984), Castillo (1997, 2008).[2][3]Mierlus-Mazilu (2010) studied on generalized pareto distribution, especially on generating random samples from it by the inverse transformation method[4]. He did not use any goodness-of-fit test for the generated random samples. The generalized pareto distribution is very important tool for modeling of economical, financial and insurance data. The companies are especially work on these areas want to plan their financial parameters. For instance, the financial crisis or same cases which are unexpected can be damage to the company. Thus, to isolate the damaged case companies should model the extreme events with the statistical distributions. By this way, companies can predict the extreme and damaged events for their financial balance. In this study, we focused on how to generate random samples from the generalized pareto mixture model and especially we test whether they fit well the generalized pareto mixture model by chi-squared goodness-of-fit test. The generalized pareto mixture model is important tool when the data has heterogeneous distribution. In real life, the risks in the financial sector can be heterogeneous dispersion so the generalized pareto mixture model is used on these areas. By now, Chi-Square goodness-of-fit test was not used in studies which are related generating random samples. For example, Beirlant(2006) used the Jackson statistics as a goodness-of-fit test for the generated random samples from Pareto-type behavior.[5] In the study, it is claimed that the log-transformed Pareto random variables are exponentially distributed and Jackson statistics, originally proposed as a goodness-of-fit statistics for testing exponentially. 2.Generalized Pareto Mixture Model 2.1.Pareto Distribution Pareto Distribution is generally used for modeling of the data which is consist of income. It was proposed by an Italian economist and sociologist Vilfredo Frederico Damaso Pareto (1897). This distribution is constructed on Pareto Principle. Based on Pareto Principle, a large portion of wealth of many societies is owned by a smaller percentage of the people in that society. This principle is explained more simply as rule which says that 20% of the population owns 80% of the wealth.[7] Pareto Distribution is also useful for modeling of finance, actuary and economics. It can be used many situations in which an equilibrium is found in the distribution of the small to the large. The probability density function of Pareto Distribution; The Pareto Distribution is characterized by is shape parameter which is positive and is scale parameter which measures the heaviness in the upper tail.mostly used notation of Pareto Distribution is with shape parameter and scale parameter. In Table 1, you can see the cumulative probability function, moment generation function, expected value and variance of Pareto Distribution. Cumulative Distribution Moment Generation Expected Value Variance Table 1: Characteristics of Pareto Distribution 2.2.Generalized Pareto Distribution Generalized Pareto Distribution is one of the most preferred method for modeling of Extreme Value problems. Extreme Value Theory interests in the values of the function which exceed the finite threshold. Especially in actuarial sector, the ISBN:

3 payments of companies consist regular policies which are in expected limits. If the unexpected events are occurred, the payments of companies can be rise up extremely. Thus the insurance companies need to model of rare events which are very important on payments. Let X is a random variable of F distribution. The interested values are exceed the finite threshold of this distribution. Generalized Pareto Distribution applies in modeling of this values which are named rare events. These values constructs of the right tail of the distribution line. The Generalized Pareto Distribution was introduced by Pikands(1975) and has been further studied by Davison, Smith(1984), Castillo(1997,2008). It has a wide application area in economics, insurance and finance.[1][2][3] The probability density function of Generalized Pareto Distribution; The Generalized Pareto Distribution is characterized by is shape parameter which domain is negative infinity to positive infinity. is scale parameter which measures the heaviness in the upper tail. is location parameter can be explains as the thresold. In Table 1, you can see the cumulative probability function, moment generation function, expected value and variance of The Generalized Pareto Distribution. Cumulative Distribution Variance Table 2: Characteristics of The Generalized Pareto Distribution 2.3.Mixture Model In statistics, there are many available method for modeling the data. These methods are named as statistical distributions. Statistical distributions are classifying to discrete probability distribution and continuous probability distributions. Data analysis used these statistical distributions which is appropriate to data s behavior. All analysis want to describe the data is concerned optimally. Occasionally, some data scan be distributed in different clusters that as seen on the distribution graph of the data. This difference also can be named as heterogeneity on the distribution. In such cases there is a new approach of modeling these data is known as the mixture models in literature. The Mixture Models provide a natural representation of heterogeneity in a number of clusters. Mixture Models provide a method of describing more complex probability distributions which is mentioned earlier as clusters, by combining several probability distributions. The combining probability distributions can be same distributions with different parameters and also different distributions. These distributions are combined with mixture weights. The notation of the probability distribution of the mixture models; Moment Generation Expected Value The Mixture Models are weighted with parameters are defined the weights of each component distributions. is defined as the parameter of the component distribution. In the formula, there is only a parameter, it can be differ according to the component distributions. The model can be constructed by combining of two or more probability density function. Defining of the component distributions is related to data analyst and his own experience. ISBN:

4 Density Plot of the data is very useful tool for defining of the component distribution. The advantages of mixture models can be explained with four main titles. First, component distributions can be multimodal. Second, the mixture model cover the data well to standard models. Third, it includes well-studied statistical inference techniques available. Last one is flexibility in choosing the component distributions. 2.4.Generalized Pareto Mixture Model Generalized Pareto Mixture Model (GPMM) is a parametric probability density function represented as a weighted sum of Generalized Pareto component densities. It is a weighted sum of k component Generalized Pareto densities as given by the equation; where x is a data vector,, are the mixture weights and, i=1, 2,,k are the component Generalized Pareto densities. Each component density is a Generalized Pareto probability density function of the form; where are location parameters, are shape parameter and are scale parameter of the Generalized Pareto probability density function. The mixture weights satisfy the constraint that 3.Simulation Study 3.1.Inverse Transformation Method Generating random samples is one of the most important subject in simulation studies. Researchers often refer generating random samples for their studies. Simulation study is not a proof for the problems but it can be preferable way for demonstrating some facts. There are several methods are used for generating random samples. Purpose of applying these methods is to generate random samples from an arbitrary distribution. Most of them are quite complex and requiring computer support. Some of them can be easier for researchers. One of the simple methods is the inverse transformation. The Inverse Transformation Method applies with the uniform distribution. After the calculation of cumulative probability function of intended distribution, random samples can be generatedwith the inverse transformation of random samples which are generated from uniform distribution. Let be a random variable with cumulative distribution function is. is a nondecreasing function, the inverse function may be defined as Let. The cumulative distribution function of the inverse transform is given by Thus, to generate a random variable with cumulative distribution, draw and set. This leads to the general method for generating random samples from an arbitrary cumulative probability distribution. [6] Algorithm: 1.Generate 2.Set. Requirements for applying the inverse transformation method: 1.The cumulative probability function of intended distribution must be nondecreasing 2.The inverse of the cumulative probability function of intended distribution must be found analitically. 3.2.Chi-Square Goodness-of-Fit Test In the simulation study, generated random samples must be check by a goodness-of-fit test be certain of belong the indented distribution. For this purpose, chi-square goodness-of-fit test is used and R is used. ISBN:

5 Chi-square goodness-of-fit test calculate a test statistic from the differences between observed frequencies and expected frequencies of theoretical distribution. The cumulative probability function of the theoretical distribution is used for calculating the expected frequencies. The test statistics is calculated below; The result is calculated from the formula is in (3.2.1) named as chi-square test statistics. This is compared with chis-square table value and then the conclusion is defined about the hypotheses. 3.3.Numerical Results In the application part of this study, the random samples are generated from GPMM by the inverse transformation method. For this, the R code block is programmed. In this code block, the threeparameters of the distribution are fixed when the random samples are generated. The mixture weights are used as 0.3 and 0.7. Algorithm; 1. Generate 2. If then set Else (or ) then set By these fixed parameters and values, the programmed R codes are worked 1000 times with different sample sizes and the appropriate samples which have appropriate distribution are detected according to the level of significance The success ratio of the results are showed as (successful trials/all trials) in the figure. Random samples are generated from six generalized pareto distribution with different parameters. Then this step is repeated 1000 times with different sample sizes which are 10, 20, 50, 100. Each cell shows the success of the generated random samples which are passed the goodness-of-fit test successfully Table3: The Results of Chi-Square Test As you seen on the table of the results, if the sample size is 100, the generated random samples distribute generalized pareto with threeparameters appropriately more than 90% success with significance level is If the sample size is increased, the success of the generator is be higher. After the seeing accordance of the generator we can generate the random samples from the GPMM.According to the algorithm, the generalized pareto distributions which are shown in Table 1 as indexes 7 and 8 are mixtured with mixture weights are 0.3 and 0.7 is used for generating random samples. Then generated random samples are shown in Figure 1. In Figure 2, the generalized pareto distributions as indexes 1 and 6 are mixtured with mixture weights are 0.3 and 0.7 is used for generating random samples and the result is shown. Shape Scale Location n=10 n=20 n=50 n=100 Index 5 = ISBN:

6 samples from the generalized pareto distribution. After researchers obtain the success of the generator, the mixture model is used for generating random samples with mixture weights. We can say that the random samples from GPMM can be safely generated by the inverse transformation method. Figure 1 Figure 2 4.Conclusions Researchers are faced with homogeneousdata in their studies. They can model these data with a known distribution. In the growing research work area, the modeling might be easy like this. There are many data which are heterogeneous in many areas. In these cases, the mixture models can be more appropriate for modeling the data. In this study, the GPMM is constructed with finite mixture weights with two components. After this, the random samples are generated from this mixture model. The results are tested with chisquared goodness-of-fit test, and the success rate of the inverse transformation method is shown in Table 1 with significance level is In this test, the components of mixture model is tested separately. After the seeing the success of the method, the generated random samples graphsare drow in Figure 1 and Figure 2. As a result, the inverse transformation method is useful way for generating random References [1] Pickands J. (1975), "Statistical Inference Using Extreme Order Statistics," The Annals of Statistics, 3: [2] Davidson A.C. (1984), Modeling excesses over high threshold with an application, In: J. Tiago de Oliveria (ed.), Statistical Extremes and Applications, Reidel, Dordrech, pp [3] Castillo, E. and A.S. Hadi, (1997), Fitting the Generalized Pareto Distribution to Data, JASA, 92( 440): Castillo J., Daoudib J. (2008), Estimation of the generalized Pareto distribution, Statistics & Probability Letters, In Press. [4] Mierlus-Mazilu (2010), On Generalized Pareto Distributions, Romanian Journal of Economic Forecasting, 8: [5] Beirlant J., Tertius de Wet, Goegebeur Y.(2006), A goodness-of-fit statistic for Pareto-type behaviour, Journal of Computational and Applied Mathematics 186 (2006) [6] Kroese D.P., Taimre T., Botev Z.I., Handbook of Monte Carlo Methods, 2011, John Wiley & Sons, Inc. [7] Raja T.A., Mir A.H.(2013), On fitting of Generalized Pareto Distribution, Global Journal of Human Social Science Economics, Volume 13 Issue 2. [8] Li Jia, Mixture Models, Lecture Notes, The Pennsylvania State University, Department of Statistics. ISBN: