Journal of Statistical Software
|
|
|
- Roxanne Lindsey
- 10 years ago
- Views:
Transcription
1 JSS Journal of Statistical Software January 2005, Volume 12, Issue 4. A Software Tool for the Exponential Power Distribution: The normalp package Angelo M. Mineo Università di Palermo Mariantonietta Ruggieri Università di Palermo Abstract In this paper we present the normalp package, a package for the statistical environment R that has a set of tools for dealing with the exponential power distribution. In this package there are functions to compute the density function, the distribution function and the quantiles from an exponential power distribution and to generate pseudo random numbers from the same distribution. Moreover, methods concerning the estimation of the distribution parameters are described and implemented. It is also possible to estimate linear regression models when we assume the random errors distributed according to an exponential power distribution. A set of functions is designed to perform simulation studies to see the suitability of the estimators used. Some examples of use of this package are provided. Keywords: exponential power distribution, R, estimation, linear regression. 1. Introduction The Exponential Power Distribution (EPD) can be considered a general distribution for random errors. In statistical inference the usual hypothesis on random errors is that they are distributed as a normal distribution, but often this hypothesis is not suitable to the study of natural phenomena. Usually, in literature two alternative approaches are used. The first is related to the outliers theory and deals with robust methods, that are often disputable. The second approach looks for suitable distributional models more general than the gaussian one; in this paper we consider the latter approach. The first formulation of the Exponential Power Distribution could be ascribed Subbotin (1923). Subbotin, starting from these two axioms: 1. The probability of a random error ε depends only on the absolute value of the error itself and can be expressed by a function ϕ(ε) having continuous first derivative almost
2 2 A Software Tool for the Exponential Power Distribution everywhere; 2. The most likely value of a quantity, of which direct measurements are known, must not depend on the used unit of measure; obtained a probability distribution with density function: ϕ(ε) = mh 2Γ(1/m) exp[ hm ε m ] with < ε < +, h > 0 and m 1; this distribution is also known as General Error Distribution (GED). The second axiom considered by Subbotin is the same of the second axiom used by Gauss to derive the usual normal error distribution; the first axiom used by Subbotin is more general: indeed, Gauss considers that the arithmetic mean represents the best way to combine observations, instead of the condition on the first derivative of the function ϕ(ε) used by Subbotin. 2. The exponential power distribution Following the procedure introduced by Pearson (1895), Lunetta (1963) derived a different parametrization of the exponential power distribution by solving the differential equation: d log f dx = p log f log a x c that gives the probability density function: 1 f(x) = ( 2σ p p 1/p Γ(1 + 1/p) exp x µ ) p pσp p (1) with < x < +, < µ < +, σ p > 0 and p > 0. This distribution, called also by Vianelli (1963) normal distribution of order p (this explains the name of the package described in the following paragraphs), is characterized by three parameters: the location parameter, µ = E[X] = σ p = {E[ X µ p ]} 1/p = + { + xf(x) dx } 1/p x µ p f(x) dx the scale parameter and p the shape parameter. The last parameter determines the shape of the curve; in this way, it is linked to the thickness of the tails, and thus to the kurtosis, of the distribution. In fact, by changing the shape parameter p, the exponential power distribution describes both leptokurtic (0 < p < 2) and platikurtic (p > 2) distributions; in particular, we obtain the Laplace distribution for p = 1, the normal distribution for p = 2 and the Uniform distribution for p. A recent review about this distribution family has been produced by Chiodi (2000). In previous papers, Chiodi (1986, 1995) presented some procedures for the
3 Journal of Statistical Software 3 generation of pseudo-random numbers from a standardized exponential power distribution, i.e. an EPD with µ = 0 and σ p = 1. These procedures, based on a generalization of the well-known Box-Muller method (1958) for generating normal pseudo-random numbers, or on acceptance-rejection rules, are, of course, very useful in simulation studies, concerning the behavior of estimators under departures from normality. Other procedures for generating pseudo-random numbers from this distribution have been presented by Johnson (1979) and Tadikamalla (1980), among others. Finally, we have to note that, although with a different parametrization, the exponential power distribution is dealt by Bayesian inference too, when there is the problem to specify a suitable a priori distribution (see, among others, Box and Tiao 1992, Choy and Smith 1997 and Achcar De Araujo Pereira 1999). A bivariate exponential power distribution has been introduced by De Simone (1968) and Taguchi (1978), while a multivariate formulation of this distribution can be found in Fang, Kotz, and Ng (1989) and Krzanowski and Marriott (1994). 3. Estimation of the exponential power distribution parameters By assuming that the shape parameter p is known, the location and scale parameters could be easily obtained by using the maximum likelihood estimation method. The estimate of the shape parameter p is an open problem, so far. Anyway several procedures have been proposed, some of which are very interesting. In literature, the main proposals are based on the maximum likelihood estimation method or on methods based on the computation of kurtosis indices. The derivation of the maximum likelihood estimators does not bring, formally, many problems; on the other hand, the maximum likelihood estimators have suitable properties, at least asymptotically, but in this case, i.e. in the case of the estimation of the shape parameter p, we could meet with computational difficulties by using numerical methods to solve the obtained equation. A new approach, to estimate the location and scale parameter of this distribution based on a genetic algorithm, has been recently proposed by Vitrano and Baragona (2004), but it does not present substantial improvements in comparison with others numerical methods based on the direct evaluation of the likelihood function. The problem of estimating the linear regression parameters when we have random errors distributed according to an exponential power distribution has been treated by Zeckhauser and Thompson (1970) and A.M. Mineo (1995) Estimation of the location and scale parameters Let s assume to have a sample of n i.i.d. observations drawn from (1), then the likelihood function is: ( ni=1 L(x; µ, σ p, p) = [2p 1/p σ p Γ(1 + 1/p)] n x i µ p ) exp (2) and the log-likelihood function is: pσ p p ni=1 l(x; µ, σ p, p) = log L(x; µ, σ p, p) = n log[2p 1/p x i µ p σ p Γ(1 + 1/p)] pσ p p (3) By deriving the log-likelihood function with respect to µ and σ p and by equalizing the obtained
4 4 A Software Tool for the Exponential Power Distribution expressions to zero, we have the following equations: dl(x) dµ = 1 n σp p x i µ p 1 sign(x i µ) = 0 i=1 (4) n x i µ p = 0 (5) i=1 dl(x) dσ p = n σ p + 1 σ p+1 p The equation (4) does not have, in general, an explicit solution and is solved by means of numerical methods, while from (5) we get the maximum likelihood estimator of σ p : ( ni=1 x i µ p ) 1/p ˆσ p =. (6) n Vianelli (1963) called ˆσ p power deviation of order p; it can be seen as a variability index that generalizes the standard deviation. Chiodi shows (1988b) that the maximum likelihood estimator of µ is unbiased, with asymptotic variance given by: var(ˆµ) n = Γ(1/p) p 2/p 2 σ 2 Γ(2 1/p) n p, (7) while Lunetta (1966), when µ is known, obtains the sampling distribution of ˆσ p p: f(x) = λc Γ(c) xc 1 e λx (8) i.e. a Gamma distribution with λ = n/(pσp), p c = n/p, E(ˆσ p) p = σp, p var(ˆσ p) p = (p σp 2p )/n. If µ is unknown, these results are only asymptotically unbiased, while ˆσ p p is biased for small samples. An unbiased estimator, when p is known and ˆµ is the maximum likelihood estimator of µ, is given by Chiodi: ni=1 σ p p x i ˆµ p =, (9) n p/2 that provides also its asymptotic sampling distribution, i.e. the Gamma distribution (8) but with c = n/p 1/2. A paper where are showed some procedures to test hypothesis on homoskedasticity when samples are drawn from an exponential power distribution has been presented by Chiodi (1988a), as well. Finally, Chiodi (1994) uses saddle point approximations to obtain asymptotic sampling distributions of the maximum likelihood estimators for the exponential power distribution parameters. The sampling distributions, obtained in this way, work well even for small samples, for example n = 5. Moreover, we can compute the Fisher information matrix on µ and σ p (Agrò 1995): [ ] Iµµ I I = µσp I σpµ I σpσ p where [ d 2 ] ln f(x) I µµ = E dµ 2 = p 2 ( ) (p 1)p p Γ p 1 p ( ) (10) σpγ 2 1 p
5 Journal of Statistical Software 5 [ d 2 ] ln f(x) I µσp = I σpµ = E = 0 (11) dµdσ p [ d 2 ] ln f(x) I σpσp = E dσp 2 = p σp 2 (12) Then, the parameters µ and σ p are orthogonal, according to the Fisher s information matrix: for the estimation of the parameters, this fact makes possible the use of the conditional profile log likelihood that has better properties than the ordinary profile log likelihood (Cox and Reid 1987). By inverting the information matrix, we obtain the asymptotic matrix of variance of the maximum likelihood estimators (ˆµ, ˆσ p ): I 1 = σp 2Γ( ) 1 p p ( (2 p)/p p 1 (p 1)Γ p 0 ) 0 that for p=1 and p=2 becomes, respectively, the asymptotic matrix of variance for the maximum likelihood estimators of the Laplace and the Gauss distribution parameters Estimation of the shape parameter p The methods presented in literature are based on the likelihood function and on indices of kurtosis. In particular, a comparison among different approaches, i.e. the whole log-likelihood, the profile log-likelihood (Barndorff-Nielsen 1988) and the conditional profile log-likelihood (Cox and Reid 1987), proposed by Agrò (1999), and an index of kurtosis, proposed by A.M. Mineo (1994), has been made by A.M. Mineo (2003b). In this paper, according to a simulation study, it seems that the best method is the one based on the index of kurtosis, while generally the whole log-likelihood approach works better than those based on the profile log-likelihood and the conditional profile log-likelihood. Indeed, the method based on the index of kurtosis VI (see the formula 15) gives estimates of p that seem less biased and more efficient than the estimates obtained by using the three methods based on the log likelihood function. In some way, these results are expected, given the link between the shape parameter p and the kurtosis (cfr ). Estimation of p by means of the maximum likelihood method If we want to determine the maximum likelihood estimator of the shape parameter p, the equation that we obtain by deriving the log-likelihood function (3) is: σ 2 p p dl(x) dp = n 1 n [log p + Ψ(1 + 1/p) 1] + p2 p 2 σp p x i µ p + i=1 + 1 [ n ] n pσp p log σ p x i µ p x i µ p log x i µ = 0 i=1 i=1 (13) where Ψ(.) is the digamma function, that is the first derivative of the logarithm of the gamma function.
6 6 A Software Tool for the Exponential Power Distribution The equation (13) can be solved by using numerical methods. Moreover, Agrò (1992) uses this method showing that it does not work well for small samples, even though it provides good results for samples of size greater than In a following paper, Agrò (1995) computes the inverse of the Fisher information matrix that defines the asymptotic matrix of variance of the maximum likelihood estimators (ˆµ, ˆσ p, ˆp): I 1 = σp 2Γ( ) 1 p p ( (2 p)/p p 1 (p 1)Γ p 0 ) 0 0 } {1 + σ 2 p p [log p+ψ(1+1/p)]2 (1+1/p)Ψ (1+1/p) 1 0 pσ p [log p+ψ(1+1/p)] 2 (1+1/p)Ψ (1+1/p) 1 [log p+ψ(1+1/p)] pσ 2 p (1+1/p)Ψ (1+1/p) 1 p 3 (1+1/p)Ψ (1+1/p) 1 where Ψ (.) is the trigamma function, that is the second derivative of the logarithm of the gamma function. From this matrix we can see how the shape parameter p is orthogonal to the location parameter µ, but is not orthogonal to the scale parameter σ p. Capobianco (2000) observes how the asymptotic variance of the scale parameter estimator is larger than the asymptotic variance of the same estimator obtained for the Gaussian or the Laplace distribution. This is an expected result, since the Gaussian and the Laplace distribution are exponential power distributions when p = 1 and p = 2, respectively; if we substitute these values in the expression of the asymptotic variance, we obtain the same results. The apparent loss of efficiency is just due to the need of estimating p. Estimation of p by means of indices of kurtosis These estimation procedures take into account the relationship between the shape parameter p and the kurtosis. The usually used indices of kurtosis are: where β 2 = µ 4 µ 2 = Γ(1/p)Γ(5/p) 2 [Γ(3/p)] 2 (14) µ2 Γ(1/p)Γ(3/p) V I = = (15) µ 1 Γ(2/p) I = 1 (16) V I β p = µ 2p µ 2 = p + 1 (17) p µ r = σpp p r/p Γ[(r + 1)/p] Γ(1/p) is the absolute moment of grade r. The index β p, called generalized index of kurtosis, has been proposed by Lunetta (1963) and gives for p = 2 the Pearson s index of kurtosis β 2. The estimators of the indices of kurtosis above described are given by: (18) ˆβ 2 = n n i=1 (x i M) 4 [ n i=1 (x i M) 2 ] 2 (19) n ni=1 (x i M) V I = 2 ni=1 (20) x i M
7 Journal of Statistical Software 7 Î = 1 V I (21) ˆβ p = n n i=1 x i M 2ˆp ( n i=1 x i M ˆp = ˆp + 1. (22) ) 2 where M is the arithmetic mean. A first solution to this problem has been given by Lunetta (1967), which observed how the kurtosis indices could be used for estimating p. Afterwards, A. Mineo (1980) proposed the solution of an equation based on a suitable combination of sampling indices of kurtosis. Again, A. Mineo (1986) shows by simulation studies the lower efficiency of some well-known robust estimators, compared to the maximum likelihood estimator of the location parameter µ. In the same paper, the author estimates p by using an inverse interpolation procedure. In a following paper, A. Mineo (1989), by implementing an iterative procedure, obtains estimators for a simple linear regression model with errors distributed as a normal distribution of order p and an estimate of p by solving the equation: ( n n ) n x i M p 2ˆp 2 (ˆp + 1) x i M ˆp p = 0 (23) i=1 i=1 that derives from the theoretical value of the generalized index of kurtosis β p. In this formula, M p is an estimate of the location parameter µ, obtained by solving the equation (4): this is necessary to make possible the iterative process of estimation of all the three parameters µ, σ p and p. An estimation method based on the index of kurtosis V I has been proposed by A.M. Mineo (1994), by comparing the theoretical value and the empirical value computed on the sample. The index V I has been preferred to β 2, because it is less affected by extreme values on data, in order to obtain better estimates, even when small samples (n 50) are considered. Moreover, the equation (23), obtained from β p, has not been considered since its expression is more difficult computationally. The validity of the proposed procedure has been provided by simulation studies. Actually, the obtained results seem to be very interesting and better than the ones obtained by methods based on the likelihood function (A.M. Mineo, 2003b). 4. The normalp package The normalp package (A.M. Mineo, 2003a) is an R (R Development Core Team 2004) package containing a collection of tools related to the exponential power distribution. The implemented functions generalize some commands already available in the base package, related to observations distributed as a normal distribution. Moreover, useful commands deal with estimation problems for linear regression models when the random errors are drawn from an exponential power distribution. The normalp package is available on the Comprehensive R Archive Network (CRAN 1 ) Computation of the exponential power distribution The normalp package has four functions, dnormp(), pnormp(), qnormp() and rnormp(), to compute the typical quantities of a probability distribution, i.e.: 1 URL:
8 8 A Software Tool for the Exponential Power Distribution dnormp(), that gives the density function of an exponential power distribution; pnormp(), that gives the probability function of an exponential power distribution; qnormp(), that gives the quantiles of an exponential power distribution; rnormp(), that generates a vector of n pseudo-random numbers from an exponential power distribution. The use of these functions is the following: dnormp(x, mu = 0, sigmap = 1, p = 2, log = FALSE)} x vector of quantiles; mu vector of location parameters; sigmap vector of scale parameters; p shape parameter; log logical; if TRUE the density is given as a log(density). pnormp(q, mu = 0, sigmap = 1, p = 2, lower.tail = TRUE, log.pr = FALSE) q vector of quantiles; mu vector of location parameters; sigmap vector of scale parameters; p shape parameter; lower.tail logical; if FALSE gives P [X > x], otherwise (default) gives P [X x]; log.pr logical; if TRUE probabilities pr are given as log(pr). qnormp(pr, mu = 0, sigmap = 1, p = 2, lower.tail = TRUE, log.pr = FALSE) pr vector of probabilities; mu vector of location parameters; sigmap vector of scale parameters; p shape parameter; lower.tail logical; if FALSE gives P [X > x], otherwise (default) gives P [X x]; log.pr logical; if TRUE probabilities pr are given as log(pr). rnormp(n, mu = 0, sigmap = 1, p = 2, method = "def") n vector of observations; mu vector of location parameters; sigmap vector of scale parameters; p shape parameter; method if set to chiodi it uses an algorithm based on a generalization of the Marsaglia (1964) formula, suggested by Chiodi (1986), otherwise it uses a faster method based on the relationship linking an exponential power distribution and a Gamma distribution (Lunetta 1963).
9 Journal of Statistical Software Graphical functions Besides the above described functions, some useful graphical functions have been implemented. In particular, these functions are: graphnp(), that visualizes from one to five different exponential power distributions, each one plotted with a different color; qqnormp(), that gives an exponential power distribution Q-Q plot; qqlinep(), that sketches a line through the first and the third quartile in an exponential power distribution Q-Q plot; Then, the functions qqnormp() and qqlinep() allow to check graphically if a vector of observations can be considered drawn from an exponential power distribution. The use of these functions is, respectively: graphnp(p = c(1, 2, 3, 4, 5), mu = 0, sigmap = 1, title="exponential power distributions") p vector of p values. It has to contain at most five elements. If one or more values of p are 50, the plot will give an uniform distribution; mu value of the location parameter; sigmap value of the scale parameter; title the plot title. qqnormp(y, ylim, p = 2, main = "Exponential power distribution Q-Q plot", xlab = "Theoretical Quantiles", ylab = "Sample Quantiles",...) qqlinep(y, p = 2,...) y p main, xlab, ylab ylim,... vector of observations; shape parameter; plot labels; graphical parameters Functions to estimate the exponential power distribution parameters Another set of functions is related to the estimation of the parameters of an exponential power distribution. In particular, we have the following functions: estimatep(), that estimates the shape parameter p from a vector of observations by means of the index of kurtosis V I (A.M. Mineo, 1994); paramp(), that estimates the location parameter µ and the scale parameter σ p by means of the maximum likelihood method, by considering the two cases when p is known
10 10 A Software Tool for the Exponential Power Distribution and when p is unknown; in the latter case, an estimate of p is obtained from the estimatep() function; this function returns a list of elements: the arithmetic mean, the standard deviation, the estimated values of location, scale and shape parameters of an exponential power distribution. Moreover, if the message iter=1 prints out, then problems on convergence have arisen in the computation; kurtosis(), that computes the theoretical and the empirical values of the kurtosis indices V I, β 2 and β p ; simul.mp(), that performs a Monte Carlo simulation to verify the behavior of the estimators of the parameters µ, σ p and p, behavior that can be seen graphically by using the function plot.simul.mp(), that returns an histogram for each set of the estimates produced by simul.mp(). The function simul.mp() returns a list containing the following components dat: a matrix m 5 reporting the result of paramp() for each sample; table: a matrix 2 5 of the means and the variances of the five estimator values. The use of these functions is described below: estimatep(x, mu = 0, p = 2, method = c("inverse", "direct")) x vector of observations; mu the location parameter; p starting value of the shape parameter; method the user can choose an inverse interpolation (faster) or a direct solution of the equation V I = V I. paramp(x, p = NULL,...) x vector of observations; p if specified, the algorithm does not use an estimate of p, but the specified value;... other arguments passed to or from other methods. kurtosis(x, p = 3, value = "parameter") x a sample of observations; if it is not specified, theoretical indices are computed; p the shape parameter; value if set to parameter, the value specified in p is used, otherwise an estimate of p is computed and estimates of the three indices are given. simul.mp(n, m, mu = 0, sigmap = 1, p = 2)
11 Journal of Statistical Software 11 n sample size; m number of samples; mu the location parameter; sigmap the scale parameter; p the shape parameter. plot.simul.mp(x,...) x a simul.mp object, i.e. a result of simul.mp();... further arguments passed to or from other methods Functions to deal with linear regression problems The normalp package contains also a set of functions to fit linear regression models, when random errors are distributed according to an exponential power distribution. The implemented functions are: lmp() that evaluates the coefficients of the linear regression model by means of the maximum likelihood method, either when p is known, or when p is unknown (A.M. Mineo, 1995); in the latter case an estimate of p is obtained from estimatep(). The lmp() function returns a list containing the following components: coefficients: the coefficient estimates; residuals: the residuals; fitted.values: the fitted values; rank: the numeric rank of the fitted linear model; df.residual: the residual degrees of freedom computed as in lm(); call: the matched call; terms: the terms object used; p: the p estimate computed on residuals; knp: a logical parameter used by summary(); model: the model frame used; iter: its value is 1 if difficulties on convergence arise. plot.lmp() that performs an analysis of residuals by returning a set of plots: a plot of residuals against fitted values; a Normal Q-Q plot; an Exponential Power Distribution Q-Q plot; a Scale-Location plot.
12 12 A Software Tool for the Exponential Power Distribution summary.lmp() that produces a list of summary statistics related to the fitted linear model given by lmp(); among them, the power deviation of order p computed on residuals: [ ] 1 e p p i S p = (24) n q where q is the number of the estimated regression coefficients if p is known, otherwise is the number of the estimated regression coefficients plus 1. simul.lmp() that allows to verify the behavior of the regression parameter estimators and the estimators of σ p and p by means of a Monte Carlo simulation study. This function returns a list containing a matrix reporting the result of lmp() for all the samples and a matrix of means and variances of the m estimates for each related parameter. The regressors are drawn from an Uniform distribution. summary.simul.lmp() that summarizes the results of simul.lmp plot.simul.lmp() returns a histogram for each set of the estimates computed by simul.lmp(). The use of these functions is the following (it is very similar to that of lm() and related functions of the stats package): lmp(formula, data = list(), p = NULL) formula the model to be fitted; data an optional data frame containing the variables in the model. By default the variables are taken from the environment; p the shape parameter. It is estimated if is not specified. plot.lmp(x,...) x is a lmp object, i.e. a result of lmp();... are further arguments passed to or from other methods. summary.lmp(object,...) object a lmp object, i.e. a result of lmp();... further arguments passed to or from other methods. simul.lmp(n, m, q, data, int = 0, sigmap = 1, p = 2, lp = FALSE)
13 Journal of Statistical Software 13 n sample size; m number of samples; q number of regressors; data a vector of coefficients; int the intercept value; sigmap the scale parameter; p the shape parameter; lp if FALSE p is estimated, otherwise the specified value of p is considered. summary.simul.lmp(x,...) x a simul.lmp object, i.e. a result of simul.lmp();... further arguments passed to or from other methods. plot.simul.lmp(x,...) x a simul.lmp object, i.e. a result of simul.lmp();... further arguments passed to or from other methods. 5. Some examples of use In this section different examples of use of the normalp package will be showed. All the examples have only the purpose to show some possible ways of use of the implemented functions Dealing with an exponential power distribution Let s compute the density for a x value with mu = 0, sigmap = 1 and p = 1.5: R> dnormp(2, p = 1.5) [1] Let s compute the distribution function for a q value with mu = 1, sigmap = 2 and p = 1.5: R> pnormp(0.7, mu = 1, sigmap = 2, p = 1.5) [1] Let s compute the quantile for a probability value pr = 0.3 with mu = 3, sigmap = 2 and p = 1.5: R> qnormp(0.3, mu = 3, sigmap = 2, p = 1.5) [1]
14 14 A Software Tool for the Exponential Power Distribution In the following example, we generate a random sample of size n = 10 from an exponential power distribution with mu = 2, sigmap = 3 and p = 1.5: R> rnormp(10, 2, 3, 1.5) [1] [7] With the following command, we plot five different distributions with p=1, 2, 3, 4, 50 (the last one will be the uniform distribution): R> graphnp(c(1 : 4, 50)) (see Figure 1) Exponential Power Distributions f(x) p= 1 p= 2 p= 3 p= 4 p > x Figure 1: Exponential power distributions. and an exponential power distribution Q-Q plot for a sample of size n = 100: R> x <- rnormp(100, p = 3) R> qqnormp(x, p = 3) R> qqlinep(x, p = 3) (see Figure 2) 5.2. Estimating the exponential power distribution parameters Let s estimate the shape parameter p from a vector of observations: R> x <- rnormp(100, 1, 2, 4) R> p <- estimatep(x, 1, 2) R> p
15 Journal of Statistical Software 15 Exponential power distribution Q Q plot p= 3 Sample Quantiles Theoretical Quantiles Figure 2: Exponential power distribution Q-Q plot for a sample of size n = 100. [1] Let s estimate µ, σ p and p from a vector of observations: R> x <- rnormp(100, 1, 2, 3) R> parameters <- paramp(x) R> parameters Mean Mp Sd Sp p Let s compute the values of theoretical and empirical indices of kurtosis, with p known or estimated: R> kurtosis(p = 2) VI B2 Bp R> kurtosis(x, p = 2, value = "parameter") VI B2 Bp R> kurtosis(x) VI B2 Bp
16 16 A Software Tool for the Exponential Power Distribution Let s perform a simulation plan for m = 100 samples of size n = 30, with mu = 0, sigmap = 1, p = 3, by showing the histogram for each set of estimates: R> sim <- simul.mp(30, 100, p = 3) R> sim Mean Mp Sd Sp p Mean Variance Number of samples with a difficult convergence: 1 R> par(mfrow = c(3, 2)) R> plot(sim) R> par(mfrow = c(1, 1)) (see Figure 3) Histogram of Mean Histogram of Mp Density Density Histogram of Sd Histogram of Sp Density Density Histogram of p Density Figure 3: Histograms of the estimates obtained by simul.mp() Fitting a linear regression model with random errors distributed according to an exponential power distribution In this subsection we consider an example that has only the purpose to show the use of the implemented functions to fit a linear regression model when we have to assume that the random errors are distributed according to an exponential power distribution. Let s consider the following data set taken from Levine, Krehbiel, and Berenson (2000); this data set reports the profits at the box office in million of dollars (Gross) and the number of sold home videos in thousands (Videos) for a sample of 30 movies; the idea is that the number of sold home videos is related to the profits at the box office:
17 Journal of Statistical Software 17 R> dataset <- read.table("movie.txt", header = TRUE) R> dataset Gross Videos R> attach(dataset) R> res.lmp <- lmp(videos ~ Gross) R> res.lmp Call: lmp(formula = Videos ~ Gross) Coefficients: (Intercept) Gross Let s report a summary of the main results with a plot showing the two straight lines derived by using the OLS methods and the lmp() method:
18 18 A Software Tool for the Exponential Power Distribution R> summary(res.lmp) Call: lmp(formula = Videos ~ Gross) Residuals: Min 1Q Median 3Q Max Coefficients: (Intercept) Gross Estimate of p Power deviation of order p: R> plot(videos ~ Gross) R> abline(lm(videos ~ Gross)) R> abline(res.lmp, col = "red") (see Figure 4) Videos Gross Figure 4: Plot of the linear regression models. Let s perform an analysis of residuals:
19 p Journal of Statistical Software 19 R> par(mfrow = c(2, 2)) R> plot(res.lmp) (see Figure 5) Residuals vs Fitted Normal Q Q plot Residuals Standardized residuals Fitted values lmp(formula = Videos ~ Gross) Theoretical Quantiles lmp(formula = Videos ~ Gross) Standardized residuals Exponential power distribution Q Q plot p= Standardized residuals Scale Location plot Theoretical Quantiles lmp(formula = Videos ~ Gross) Fitted values lmp(formula = Videos ~ Gross) Figure 5: Residuals plots obtained by the function plot.lmp(). Let s perform a simulation plan for m = 100 samples of size n = 20 for a linear regression model with only one regressor, by showing the histogram for each set of estimates: R> sim <- simul.lmp(20, 100, 1, 1.5, 1, 1, 3) R> sim (intercept) x1 Sp p Mean Variance Number of samples with a difficult convergence: 1 R> summary(sim) Results: (intercept) x1 Sp p Mean Variance Coefficients: (intercept) x
20 20 A Software Tool for the Exponential Power Distribution Formula: y~+x1 Number of samples: 100 Value of p: 3 Number of samples with problems on convergence 1 R> par(mfrow = c(2, 2)) R> plot(sim) (see Figure 6) Histogram of intercept Histogram of x1 Density Density Histogram of Sp Histogram of p Density Density Figure 6: Histograms of the estimates produced by simul.lmp(). 6. Conclusions In this paper, we have described some results obtained on the Exponential Power Distribution and implemented in a package, normalp, that is a contributed package of the statistical environment R. The implemented functions concern essentially estimation problems for linear regression models, besides some graphical commands that generalize graphical tools already implemented in the package graphs of R, related to observations distributed as a normal distribution. At the end of the paper, several examples of use have been showed. People that are used to working with R can find very easy the syntax of the functions implemented in this package. We think that this package could be very useful for anyone has to treat with this
21 Journal of Statistical Software 21 kind of probability distribution. Our intention is to work more on this package to improve its flexibility. Acknowledgments The authors are grateful to two anonymous referees for thorough and constructive comments which greatly improved this paper. References Achcar De Araujo Pereira JA (1999). Use of Exponential Power Distributions for Mixture Models in the Presence of Covariates. Journal of Applied Statistics, 26, Agrò G (1992). Maximum Likelihood and L p -norm Estimators. Statistica Applicata, 4(2), Agrò G (1995). Maximum Likelihood Estimation for the Exponential Power Distribution. Communications in Statistics (simulation and computation), 24(2), Agrò G (1999). Parameter Orthogonality and Conditional Profile Likelihood: The Exponential Power Function case. Communications in Statistics (theory and methods), 28(8), Barndorff-Nielsen OE (1988). Parametric Statistical Models and Likelihood. Lecture Notes in Statistics, No. 50, Springer-Verlag. Box GEP, Muller ME (1958). A Note on the Generalization of Random Normal Deviates. Annals of Mathematical Statistics, 29, Box GEP, Tiao GC (1992). Bayesian Inference in Statistical Analysis. J. Wiley, New York. Capobianco R (2000). Robustness Aspects of the Generalized Normal Distribution. Quaderni di Statistica, Centro per la formazione in Economia e Politica dello sviluppo rurale, Dipartimento di Scienze Statistiche dell Università di Napoli Federico II, Dipartimento di Scienze Economiche dell Università di Salerno, 2, Chiodi M (1986). Procedures for Generating Pseudo-Random Numbers from a Normal Distribution of Order p. Statistica Applicata, 19, Chiodi M (1988a). Due Tests per la Verifica di Ipotesi di Omogeneità ed Omoscedasticità per Campioni Provenienti da Distribuzioni Normali di Ordine p. Technical report, Istituto di Statistica, Facoltà di Economia e Commercio di Palermo. Chiodi M (1988b). Sulle Distribuzioni di Campionamento delle Stime di Massima Verosimiglianza dei Parametri delle Curve Normali di Ordine p. Technical report, Istituto di Statistica, Facoltà di Economia e Commercio di Palermo. Chiodi M (1994). Approssimazioni Saddle-point alle Distribuzioni Campionarie degli Stimatori di Massima Verosimiglianza dei Parametri delle Curve Normali di Ordine p per Piccoli Campioni. In Atti della XXXVII Riunione Scientifica della SIS, pp San Remo.
22 22 A Software Tool for the Exponential Power Distribution Chiodi M (1995). Generation of Pseudo-Random Variates from a Normal Distribution of Order p. Statistica Applicata, 7, Chiodi M (2000). Le Curve Normali di Ordine p come Distribuzioni di Errori Accidentali: una Rassegna dei Risultati e Problemi Aperti per il Caso Univariato e per quello Multivariato. In Atti della XL Riunione Scientifica della SIS, pp Firenze. Choy STB, Smith AFM (1997). On Robust Analysis of a Normal Location Parameter. Journal of the Royal Statistical Society, B, 59, Cox DR, Reid N (1987). Parameter Orthogonality and Approximate Conditional Inference. Journal of the Royal Statistical Society (B), 49, De Simone S (1968). Su una Estensione dello Schema delle Curve Normali di Ordine r alle Variabili Doppie. Statistica, 37, Fang KT, Kotz S, Ng KW (1989). Symmetric Multivariate and Related Distributions. Chapman and Hall, New York. Johnson ME (1979). Computer Generation of the Exponential Power Distributions. Journal of Statistical Computation and Simulation, 9, Krzanowski WJ, Marriott FHC (1994). Multivariate Analysis (Part I). Edward Arnold, London. Levine DM, Krehbiel TC, Berenson ML (2000). Business Statistics: A First Course. Prentice- Hall. Lunetta G (1963). Di una Generalizzazione dello Schema della Curva Normale. Annali della Facoltà di Economia e Commercio di Palermo, 17, Lunetta G (1966). Di Alcune Distribuzioni Deducibili da una Generalizzazione dello Schema della Curva Normale. Annali della Facoltà di Economia e Commercio di Palermo, 20, Lunetta G (1967). Su una Estensione del Coefficiente di Correlazione Lineare. Annali della Facoltà di Economia e Commercio di Palermo, 21, Marsaglia G, MacLaren MD, Bray TA (1964). A Fast Procedure for Generating Normal Random Variables. Communications of the ACM, 7, Mineo A (1980). La Stima dei Parametri delle Curve Normali di Ordine r per Intervalli Finiti e di quelle per Intervalli Infiniti. In CLEUB (ed.), Studi in Onore di Paolo Fortunati, pp Bologna. Mineo A (1986). La Migliore Combinazione delle Osservazioni per la Stima dei Parametri di Intensità e di Dispersione. In Studi in Onore di Silvio Vianelli, pp Palermo. Mineo A (1989). The Norm-p Estimation of Location, Scale and Simple Linear Regression Parameters. In Sringer-Verlag (ed.), Lectures Notes in Statistics (57), pp Heidelberg.
23 Journal of Statistical Software 23 Mineo AM (1994). Un Nuovo Metodo di Stima di p per una Corretta Valutazione dei Parametri di Intensità e di Scala di una Curva Normale di Ordine p. In Atti della XXXVII Riunione Scientifica della SIS, pp San Remo. Mineo AM (1995). Stima dei Parametri di Regressione Lineare Semplice quando gli Errori Seguono una Distribuzione Normale di Ordine p (p incognito). Annali della Facoltà di Economia di Palermo (area statistico-matematica), 49, Mineo AM (2003a). The normalp package. URL contrib/packages.html#normalp. Mineo AM (2003b). On the Estimation of the Structure Parameter of a Normal Distribution of Order p. Statistica, 63, Pearson K (1895). Contributions to the Mathematical Theory of Evolution. II. Skew Variation in Homogeneous Material. Philosophical Transactions of the Royal Society of London (A), 186, R Development Core Team (2004). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN , URL http: // Subbotin MT (1923). On the Law of Frequency of Errors. Matematicheskii Sbornik, 31, Tadikamalla PR (1980). Random Sampling from the Exponential Power Distribution. Journal of the American Statistical Association, 75, Taguchi T (1978). On a Generalization of Gaussian Distribution. Annals of the Institute of Statistical Mathematics, 30, Vianelli S (1963). La Misura della Variabilità Condizionata in uno Schema Generale delle Curve Normali di Frequenza. Statistica, 23, Vitrano S, Baragona R (2004). The Genetic Algorithm Estimates for the Parameters of Order p Normal Distributions. In Springer-Verlag (ed.), Advances in Multivariate Data Analysis, pp Heidelberg. Zeckhauser R, Thompson M (1970). Linear Regression with Non-Normal Error Terms. The Review of Economics and Statistics, 52,
24 24 A Software Tool for the Exponential Power Distribution Affiliation: Angelo M. Mineo Dipartimento di Scienze Statistiche e Matematiche Silvio Vianelli Università di Palermo Palermo, Italy Telephone: +39/091/ Fax: +39/091/ [email protected] URL: Journal of Statistical Software Submitted: January 2005, Volume 12, Issue 4. Accepted:
A teaching experience through the development of hypertexts and object oriented software.
A teaching experience through the development of hypertexts and object oriented software. Marcello Chiodi Istituto di Statistica. Facoltà di Economia-Palermo-Italy 1. Introduction (1) This paper is concerned
Multivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Supplement to Call Centers with Delay Information: Models and Insights
Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290
Dongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Geostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras [email protected]
A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS
A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
Logistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
Handling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
Data Preparation and Statistical Displays
Reservoir Modeling with GSLIB Data Preparation and Statistical Displays Data Cleaning / Quality Control Statistics as Parameters for Random Function Models Univariate Statistics Histograms and Probability
Java Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Exploratory Data Analysis
Exploratory Data Analysis Johannes Schauer [email protected] Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST
UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly
Applications of R Software in Bayesian Data Analysis
Article International Journal of Information Science and System, 2012, 1(1): 7-23 International Journal of Information Science and System Journal homepage: www.modernscientificpress.com/journals/ijinfosci.aspx
Chapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce
Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes
Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes Yong Bao a, Aman Ullah b, Yun Wang c, and Jun Yu d a Purdue University, IN, USA b University of California, Riverside, CA, USA
VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA
VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA Csilla Csendes University of Miskolc, Hungary Department of Applied Mathematics ICAM 2010 Probability density functions A random variable X has density
From the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
SUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010
Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte
Algebra 1 Course Information
Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through
Normality Testing in Excel
Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. [email protected]
BayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,
Computing: an indispensable tool or an insurmountable hurdle? Iain Currie Heriot Watt University, Scotland ATRC, University College Dublin July 2006 Plan of talk General remarks The professional syllabus
Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes
Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes Simcha Pollack, Ph.D. St. John s University Tobin College of Business Queens, NY, 11439 [email protected]
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on
DATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
A logistic approximation to the cumulative normal distribution
A logistic approximation to the cumulative normal distribution Shannon R. Bowling 1 ; Mohammad T. Khasawneh 2 ; Sittichai Kaewkuekool 3 ; Byung Rae Cho 4 1 Old Dominion University (USA); 2 State University
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)
Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
Descriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
Reject Inference in Credit Scoring. Jie-Men Mok
Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
Using R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
Package SHELF. February 5, 2016
Type Package Package SHELF February 5, 2016 Title Tools to Support the Sheffield Elicitation Framework (SHELF) Version 1.1.0 Date 2016-01-29 Author Jeremy Oakley Maintainer Jeremy Oakley
PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION
PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION Chin-Diew Lai, Department of Statistics, Massey University, New Zealand John C W Rayner, School of Mathematics and Applied Statistics,
Multiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
An Internal Model for Operational Risk Computation
An Internal Model for Operational Risk Computation Seminarios de Matemática Financiera Instituto MEFF-RiskLab, Madrid http://www.risklab-madrid.uam.es/ Nicolas Baud, Antoine Frachot & Thierry Roncalli
Maximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
A MULTIVARIATE OUTLIER DETECTION METHOD
A MULTIVARIATE OUTLIER DETECTION METHOD P. Filzmoser Department of Statistics and Probability Theory Vienna, AUSTRIA e-mail: [email protected] Abstract A method for the detection of multivariate
COMMON CORE STATE STANDARDS FOR
COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in
GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE
ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Interpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
How To Write A Data Analysis
Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction
MATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
Comparison of Estimation Methods for Complex Survey Data Analysis
Comparison of Estimation Methods for Complex Survey Data Analysis Tihomir Asparouhov 1 Muthen & Muthen Bengt Muthen 2 UCLA 1 Tihomir Asparouhov, Muthen & Muthen, 3463 Stoner Ave. Los Angeles, CA 90066.
2WB05 Simulation Lecture 8: Generating random variables
2WB05 Simulation Lecture 8: Generating random variables Marko Boon http://www.win.tue.nl/courses/2wb05 January 7, 2013 Outline 2/36 1. How do we generate random variables? 2. Fitting distributions Generating
Tutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls [email protected] MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
Prentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009
Content Area: Mathematics Grade Level Expectations: High School Standard: Number Sense, Properties, and Operations Understand the structure and properties of our number system. At their most basic level
Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
Statistical Analysis of Life Insurance Policy Termination and Survivorship
Statistical Analysis of Life Insurance Policy Termination and Survivorship Emiliano A. Valdez, PhD, FSA Michigan State University joint work with J. Vadiveloo and U. Dias Session ES82 (Statistics in Actuarial
Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]
Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
Exercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
Statistics 104: Section 6!
Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: [email protected] Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC
The Variability of P-Values. Summary
The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 [email protected] August 15, 2009 NC State Statistics Departement Tech Report
Econometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
Lecture 5 : The Poisson Distribution
Lecture 5 : The Poisson Distribution Jonathan Marchini November 10, 2008 1 Introduction Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume,
Nonparametric adaptive age replacement with a one-cycle criterion
Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: [email protected]
Estimating Industry Multiples
Estimating Industry Multiples Malcolm Baker * Harvard University Richard S. Ruback Harvard University First Draft: May 1999 Rev. June 11, 1999 Abstract We analyze industry multiples for the S&P 500 in
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
Lecture 8: Gamma regression
Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing
2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:
Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve
Chapter 4: Statistical Hypothesis Testing
Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin
Factorial experimental designs and generalized linear models
Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya
Practice problems for Homework 11 - Point Estimation
Practice problems for Homework 11 - Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:
A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University
A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses Michael R. Powers[ ] Temple University and Tsinghua University Thomas Y. Powers Yale University [June 2009] Abstract We propose a
Comparison of sales forecasting models for an innovative agro-industrial product: Bass model versus logistic function
The Empirical Econometrics and Quantitative Economics Letters ISSN 2286 7147 EEQEL all rights reserved Volume 1, Number 4 (December 2012), pp. 89 106. Comparison of sales forecasting models for an innovative
2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
Centre for Central Banking Studies
Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics
X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.
**BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,
