Software for the analysis of extreme events: The current state and future directions

Size: px
Start display at page:

Download "Software for the analysis of extreme events: The current state and future directions"

Transcription

1 Extremes (2006) 8: DOI /s Software for the analysis of extreme events: The current state and future directions Alec Stephenson & Eric Gilleland Received: 29 October 2005 / Revised: 29 October 2005 / Accepted: 22 November 2005 # Springer Science + Business Media, LLC 2006 Abstract The last few years have seen a significant increase in publicly available software specifically targeted to the analysis of extreme values. This reflects the increase in the use of extreme value methodology by the general statistical community. The software that is available for the analysis of extremes has evolved in essentially independent units, with most forming extensions of larger software environments. An inevitable consequence is that these units are spread about the statistical landscape. Scientists seeking to apply extreme value methods must spend considerable time and effort in determining whether the currently available software can be usefully applied to a given problem. We attempt to simplify this process by reviewing the current state, and suggest future approaches for software development. These suggestions aim to provide a basis for an initiative leading to the successful creation and distribution of a flexible and extensible set of tools for extreme value practitioners and researchers alike. In particular, we propose a collaborative framework for which cooperation between developers is of fundamental importance. Keywords Extreme value theory. Software development. Statistical computing AMS 2000 Subject Classification Primary 62P99 A. Stephenson (*) Department of Statistics and Applied Probability, Faculty of Science, National University of Singapore, Singapore , Singapore stasag@nus.edu.sg E. Gilleland Research Applications Laboratory, National Center for Atmospheric Research, 3450 Mitchell Lane, Boulder( CO 80301, USA ericg@ucar.edu

2 88 A. Stephenson, E. Gilleland 1. Introduction In many diverse areas of application the extreme values of a process are of primary importance. Internet traffic, rainfall, material strength, pollutant concentrations and insurance claim sizes are all examples of processes of this type. A number of different techniques can be justifiably employed to analyse the extremes of one or more processes, and the need for software for both presentation and evaluation is common to all. This typically takes place on an individual level, where code is written to satisfy the objectives for the specific task at hand. The code may only be applicable to a particular dataset, or may be targeted at a specialised methodology. It would not therefore be suitable for similar problems, and would be of limited use if distributed to a wider community. In some cases though, code is developed or altered to be more general in scope, so that it implements a broader methodology, or can be applied to datasets of a certain form or class. Such code could be supplemented by some form of documentation and be made publicly available as a packaged unit. It is on this basis that the majority of the software introduced in Section 2 has been developed. The development and distribution of software involves important decisions with regard to elements such as; the programming language or languages and the related environment, the incorporation of external routines or algorithms, the license on release, methods of documentation, the programming interface, and more generally the overall design strategy. The consideration of these issues should be made in the context of the objectives of the project or packaged unit, and preferably prior to development, thus encouraging well designed software. Such issues are discussed further in Section 3. Because much of the software of Section 2 has been initiated through code written for the purpose of an individual, the decisions outlined in the previous paragraph have largely been based on personal preference. Furthermore, the packaged units of publicly available software are spread about the statistical landscape, requiring scientists seeking to apply extreme value methods to spend considerable time and effort in determining whether the currently available software may be usefully applied to a given problem. Moreover, certain statistical techniques are independently implemented in more than one packaged unit, and there often exists slight differences between implementations with respect to both reliability and methodology. We believe that the creation of a software initiative to develop a reliable set of extreme value routines will be of considerable benefit to the extreme value community. It will also help to promote the methodology to practitioners from a wide range of scientific disciplines. Software developers can play a key role in the success of such an initiative, so the issue of how to attract and encourage developers is critical. In Section 2 we provide an overview of existing publicly available software for the analysis of extreme values, including a summary of the statistical techniques that are available in each distributed unit. In Section 3 we discuss possible future directions for the development of software for the analysis of extreme values. We propose that a collaborative effort be made, and in order to develop flexible and extensible software, that an open source computing environment be the basis for this effort. In Section 4 we discuss incentives for software developers and comment on funding issues. A concluding discussion is given in Section 5.

3 Software for the analysis of extreme events: The current state and future directions The current state In this section we survey the current software available for the analysis of extreme values. We make no guarantee that all publicly available software will be covered, but all that is known to the authors will be addressed, and we make the presumption that this accounts for all software in common use at the time of writing. We begin by briefly reviewing standard extreme value techniques, in order to provide more clarity when discussing different packaged software units. The categorisation of different methodologies essentially follows Beirlant et al. (2004), but here we give more emphasis to those methods which are implemented more commonly in available software. Alternative texts covering the techniques outlined here include Coles (2001), Reiss and Thomas (2001) and Embrechts et al. (1997). We then summarise the different packages available and discuss the techniques that they implement. We begin this discussion with those units that are global in their aims, and go on to discuss those units that focus on specific aspects of extreme value methodology. We underline that the summary of the utility of software units in terms of the statistical techniques they perform is merely a summary, and that detailed discussions of specific implementations of methodology are sacrificed in order to obtain a simplified overarching view of what is available to the practitioner Extreme value techniques Classical methodology Let X 1 ; ; X n be a random sample from a distribution F, and let M n ¼ maxfx 1 ; ; X n g. Extreme value theory specifies the class of non-degenerate limiting distributions for the maximum M n under linear normalisation. This class is given by the parametric form h i GðxÞ ¼exp f1 þ ðx Þ=g ; ð2:1þ where, > 0 and are the location, scale and shape parameters, respectively, and z þ ¼ maxðz; 0Þ. G is known as the generalised extreme value distribution; the Gumbel distribution obtained in the limit as! 0. Larger values for yield heavier tailed distributions. The domain-of-attraction problem considers the type of distributions F that lead to particular forms of G. It transpires that the sign of is the dominating factor; for >0 convergence is obtained if and only if 1 FðxÞ ¼ x 1= ðxþ, where ðþ satisfies the slowly varying property that the limit as x!1of the ratio ðxtþ= ðxþ is one for all t > 0. Similar conditions can be obtained in the cases ¼ 0 and <0. The above result forms the basis for a number of different statistical methods. For large N, the generalised extreme value distribution approximates the distribution of M N. Given a dataset of maxima, this approximation can be treated as exact, and hence the distribution can be fitted to the data by estimating the parameters ð; ; Þ. This is commonly known as the block maxima approach since it can be employed after dividing the sample X 1 ;...; X n into blocks and considering the maxima derived from each block. The estimation can be performed by maximum likelihood, and likelihood inferential methods can subsequently be used. The likelihood problem is non-regular since the end points of the generalised þ

4 90 A. Stephenson, E. Gilleland extreme value distribution depend on the parameters, but it can be shown (Smith, 1985) that the usual properties exist when > 0:5, which is typically the case for statistical applications. Alternative estimation methods are possible, and of these, the probability weighted moments method of Hosking et al. (1985) is perhaps the most popular. This method has desirable properties, but it does not easily extend to regression type models which we discuss subsequently. Estimation and inference under the Bayesian paradigm can also be performed (e.g., Stephenson and Tawn, 2004). Given any set of parameter estimates, extreme quantiles of the maximum distribution can be estimated by inverting the estimated generalised extreme value distribution function. Extreme quantiles of the distribution F can be estimated similarly on the basis that F n approximates G. Although the above result focuses on maxima, it leads to an approximation to the conditional exceedance probability PrðX > xjx > uþ for all x greater than some predetermined large threshold u. Specifically, the generalised Pareto distribution (Pickands, 1975) can be obtained as an approximation of the conditional distribution of exceedances. For statistical inference this distribution is often treated as exact, and hence the data that exceed u can be used to derive estimates of the generalised Pareto parameters ð u ;Þ. Maximum likelihood or probability weighted moments estimates can again be used. The conditioning probability PrðX > uþ can be estimated using the empirical proportion of exceedances of u, and extreme quantiles can then be estimated by simply inverting the corresponding model for the unconditional tail probability PrðX > xþ. This technique is known as the peaks over threshold (POT) method. It should be observed that the generalised Pareto scale parameter u depends on the threshold u. An alternative point process representation typically uses a different parameterisation for which this is not the case, and hence it is preferable in statistical applications to use this representation whenever regression type models, to be discussed subsequently, are employed. For the moment we assume that >0, and we return to the domain of attraction condition 1 FðxÞ ¼x 1= ðxþ. There are a number of statistical methods, and in particular estimates of, based on such conditions. Let X 1;n... X n;n denote the order statistics of X 1 ;...; X n, so that M n ¼ X n;n. Using the domain of attraction condition the conditional tail probability PrðX=u > yjx > uþ for large u and y > 1is approximately equal to y 1=. The maximum likelihood estimate for based on the exceedances of u has a closed form, and for u equal to the kth largest order statistic X n k;n it is given by the Hill estimator (Hill, 1975) ^ H ¼ 1 k X k j¼1 log X n jþ1;n log X n k;n : ð2:2þ The corresponding tail model is a POT model with >0 and u ¼ u, and therefore the Hill estimator can be obtained as a maximum likelihood estimator in a constrained POT analysis with threshold u ¼ X n k;n. The Hill estimator is a popular estimator with a simple form. However, it is not location invariant and it can be severely biased if the distributional approximation is poor. These weaknesses have led to proposals of a large number of estimators which can be seen as variants of ^ H ; see Chapter 4 of Beirlant et al. (2004). We have thus far assumed that >0. More general domain of attraction conditions for the case of 2 R can also be used to construct estimators. One such simple estimator was proposed by Pickands (1975), and many variants and alternatives have been suggested. A review of such

5 Software for the analysis of extreme events: The current state and future directions 91 estimators and associated estimators of extreme quantiles is given in Chapter 5 of Beirlant et al. (2004). Common to all methods based on the full sample X 1 ;...; X n is the choice of a threshold u, or equivalently an integer k such that u ¼ X n k;n. Generally estimates are derived for all reasonable values of u or k, and then a plot of these estimates against u or k can be constructed. One common method is to then choose the threshold as the lowest value above which the estimates are roughly constant. In a POT analysis using the generalised Pareto representation, the parameter u is expected to depend on the threshold, so it is typical to instead plot the threshold independent quantity u u. In all cases the variance of each estimate must be taken into consideration. At higher thresholds the variance becomes much larger because fewer data points are used. This often makes such plots difficult to interpret Extended methodology Suppose we have a stationary sequence X 1 ; X 2 ;... of random variables with common marginal distribution function F. The block maxima approach can be applied under the weaker stationary assumption provided that a long range dependence constraint (e.g., Leadbetter et al., 1983) is assumed. The dependence affects the parameter estimates, and the extent of this effect can be summarised by the extremal index 0 1, with ¼ 1 for both independent processes and dependent processes satisfying a short range dependence restriction. In the stationary case it is F n, and not F n, that is approximated by G. Extending methods based on threshold exceedances to stationary processes is more difficult. One approach is to construct a formal model for the process, or more specifically for the exceedances of the process. The data can then be used to estimate the model parameters. Markov chain model structures are commonly assumed. Alternative methods typically involve identifying clusters of exceedances. One common method is to take the maximum of each cluster, and perform a POT analysis on the cluster maxima. Estimates of quantiles and other derived quantities must be adapted to incorporate and hence account for the dependence. Estimates of can be based on its interpretation as the reciprocal of the mean cluster size in the limit as u!1. The runs method and the blocks method are two popular methods of estimation. We now suppose we have covariate information, and we wish to include such information in our model. A typical approach to the inclusion of covariate information is to impose regression models directly on the parameters ð; ; Þ of the generalised extreme value distribution or of the point process representation of the POT method. For example, in the block maxima case, it may be prudent to model the ith annual maximum as generalised extreme value with parameters ð i ;;Þ with i ¼ 0 þ 1 z i, where z ¼ðz 1 ;...; z n Þ are n observations on a single covariate, such as the year of observation, and i ¼ 1;...; n. The parameter set is therefore extended from ð; ; Þ to ð 0 ; 1 ;;Þ, and the four parameters can be easily estimated using maximum likelihood. The regression models for ð; ; Þ can naturally be extended to incorporate multiple covariates. The form of the models can also be extended to generalised linear types, or to include non-parametric or semi-parametric terms. The multivariate extension of the limiting distribution (2.1) focuses on componentwise maxima. Briefly, extreme value theory specifies the class of d-dimensional limiting distributions for the componentwise maxima under linear normalisations of

6 92 A. Stephenson, E. Gilleland each component. This class is known as the class of multivariate extreme value distributions. The form of the dependence structure has no finite parameterisation, see e.g., Chapter 8 of Beirlant et al. (2004) for details and domain of attraction conditions. The asymptotic result leads to a multivariate extension of the block maxima approach whereby the multivariate extreme value distribution G replaces the generalised extreme value distribution. Since there is no finite parameterisation, a parametric model for the dependence structure is typically assumed. Alternatively, non-parametric estimates can be used. The multivariate extension to POT methods is not so obvious. There are two common approaches, one based on a point process approach (Coles and Tawn, 1991) and one based on a censored likelihood (Ledford and Tawn, 1996). One difficulty with using extreme value distributions as the basis of an inferential framework is that asymptotic independence cannot be obtained if dependence exists at finite levels (Coles et al., 1999). Model summaries and extensions that attempt to account for this are given in Chapter 9 of Beirlant et al. (2004). Available software does not currently allow for models that account for asymptotic independence, and so we do not discuss this issue further Software summary We now summarise the different packaged units available, in no particular order of preference. In order to maintain the flow of the discussion, a glossary is provided as an appendix, which contains explanations of acronyms and technical computing terms, details of how software may be obtained, and software authorship. Please note that any web addresses given in the glossary may be liable to change. At first mention the terms appearing in the glossary are marked with the y symbol. We begin with units which form extensions of the S-Plus y environment. The S- Plus module S þ FinMetrics y provides tools for econometric and financial analysis. Loosely speaking, an S-Plus module is an add-on which must be purchased, whereas an S-Plus library is available at no additional charge. S þ FinMetrics includes functions for analysing extreme values, and in particular for copula modelling. The extreme value functions in S þ FinMetrics are divided into the two categories: Classical Extreme Value Theory and Extreme Value Analysis with Statistical Copula Estimation. The categories are, respectively, based on version four of the library EVIS y, and on the library EVANESCE y. There appears to be little or no difference between the categories and their respective libraries in terms of the methods that they implement. We therefore restrict our attention to the two categories of S þ FinMetrics, which we denote as categories ðaþ and ðbþ, respectively. The library EVIS is openly available, but the library EVANESCE and the related library safd y are only available through an obscure web link and furthermore do not appear on the list of libraries given on the S-Plus web site. There are a number of units which form extensions of the R y open source statistical software environment, and all those given here can be downloaded from CRAN y. The R package ismev y contains code that performs analyses of Coles (2001). The original S-Plus code on which it is based is available from the web address given in Coles (2001). The package extremes y is primarily a GUI interface to the package ismev, with some additional functionality for clustering and estimation of the extremal index. We will refer to extremes only when discussing the additional functionality, and we will refer to ismev when discussing functionality common to both packages. The package evir y is an R port, or

7 Software for the analysis of extreme events: The current state and future directions 93 conversion, of the S-Plus library EVIS. Since evir implements the same methods as EVIS, we do not consider it further. The packages evd y and evdbayes y implement a number of different methods using classical and Bayesian inference, respectively. Rmetrics y is a project which aims to provide an open source solution for financial and econometric analysis. It includes seven packages, one of which is fextremes y, which in turn requires the package fbasics. Many functions within fextremes are largely based on those within the three packages ismev, evir and evd. The packages VaR y and RandomFields y also contain functions related to extreme value analysis, although neither is primarily focused on extreme value methodology. The former implements the POT method in order to estimate a financial tail quantity known as Value-at-Risk, while the latter contains functions to simulate max-stable processes. A max-stable process has the property that the joint distribution of the process observed at finitely many points is multivariate extreme value. The MATLAB y package EVIM y implements a wide range of extreme value methods, and is similar in functionality to EVIS. The package EXTREMES y contains a suite of functions written in C++ with a MATLAB GUI for the Windows operating system, providing tools for extreme quantile estimation and tail modelling. WAFO y is a toolbox of MATLAB routines for the statistical analysis and simulation of random waves and random loads, and it contains a number of extreme value routines. The environments Xtremes y and HYFRAN y are stand-alone statistical software systems; the former is associated with the book Reiss and Fig. 1 Screenshot of extremes GUI interface. The R console window containing the command line interface is shown in the background

8 94 A. Stephenson, E. Gilleland Thomas (2001). A new edition of the book and a new version of the software is due for release in late Both environments employ GUI interfaces for implementing extreme value methods. HYFRAN also implements estimation for a large number of non-standard statistical distributions. Xtremes allows user enhancement through the StatPascal programming language. Many of the algorithms used by Xtremes are also available in the finance library of XploRe y. Finally, there are a number of publicly available routines, written in S and FORTRAN code, associated with the book Beirlant et al. (2004); the overall suite of associated software is in development and therefore we do not consider these routines further. The above description of packaged software units essentially groups the units in terms of the associated software environment, but it is worth considering other categorisations. The majority of packages seek to implement a wide range of standard statistical extreme value methods that are of interest to the developer. The exceptions are evdbayes, VaR, RandomFields and WAFO, each of which has more specialised aims, and where for the latter three, extreme value methods are not the primary focus. The majority of packaged units use a command line interface. The exceptions are extremes, EXTREMES, Xtremes and HYFRAN which employ GUI interfaces, and hence they are very easy to use. The associated penalty is the lack of flexibility, though for sufficiently able users this is countered by the availability of a command line interface for more complicated tasks. Figs. 1, 2, 3 give typical screenshots for extremes, EXTREMES and Xtremes, respectively. Fig. 2 Screenshot of EXTREMES GUI interface

9 Software for the analysis of extreme events: The current state and future directions Software implementations General-purpose units The packaged units described here are general-purpose because they seek to give the user a suite of functions that enable standard extreme value methods to be easily implemented. The specialised units evdbayes, VaR, RandomFields and WAFO are described subsequently. We discuss the implementations of each general-purpose unit by considering different methodologies, approximately ordered by increasing complexity. Table 1 provides a brief outline of the methods implemented by these units, and could be regarded as a crude summary of the discussion to follow. We emphasise that although a given unit may enable the user to implement a large range of statistical techniques, this does not imply that the unit is recommended above all others since there is no guarantee that the included routines are robust or reliable. Considerable experience with a given software unit is required in order for this to be determined. Graphical tools. As with any statistical analysis, the analysis of extremes typically begins with initial exploratory investigations incorporating graphical tools. These tools are largely standard, and are therefore incorporated into standard statistical environments. This may initially appear to be a disadvantage to the user of the stand-alone Xtremes or HYFRAN environments, but the environments themselves each contain an extensive range of graphical tools, incorporating all those which Fig. 3 Screenshot of Xtremes GUI interface

10 96 A. Stephenson, E. Gilleland Table 1 A crude summary of the methods implemented in a number of distributed software units BM POT DOA PDG CST REG BBM BPOT S þ FinMetricsðAÞ Y Y Y Y Y L S þ FinMetricsðBÞ L L L ismev Y Y Y Y extremes Y Y Y Y evd Y Y Y Y L Y L fextremes Y Y Y Y Y EVIM Y Y Y Y Y Xtremes Y Y Y Y Y Y L HYFRAN Y Y Y EXTREMES L Y Y Y The letters Y and L denote implemented methods; the latter denotes implementations to a lesser degree. The column headings summarise different methodologies; block maxima (BM), peaks over threshold (POT), other domain-of-attraction based estimators (DOA), plotting diagnostics (PDG), clustering (CST), non-stationary regression models (REG), bivariate block maxima (BBM), and bivariate peaks over threshold (BPOT). S þ FinMetricsðAÞ has the same functionality as both EVIS and evir. S þ FinMetricsðBÞ has the same functionality as EVANESCE. The software units VaR, RandomFields, WAFO and evdbayes do not appear in the table because they implement specialised methodology. would be required for any regular analysis. Of the remaining software units fextremes, S þ FinMetricsðAÞ and EVIM have functions for quantile plotting, with the former being a little more versatile. Empirical distribution plotting functions are additionally included in these three units as simple wrappers to underlying code. The units also have plots for record development, and fextremes has plots based on ratios of maxima and sums. The commonly used mean excess (or mean residual life) plot is available in all packaged units. We note that since quantile and distribution plotting are fairly standard tools, most users will be able to produce such plots without the need for specific packaged functions, though perhaps in a more rudimentary form. Classical univariate analysis. The most important distributions in extreme value analyses are the generalised extreme value and generalised Pareto distributions. Consequently, all units contain routines that simulate from these distributions and calculate distribution and quantile functions. The majority of units contain routines that calculate the associated density functions. Other (univariate) distributions may be of interest when considering domain-of-attraction issues. Statistical environments typically provide routines for standard distributions. Non-standard distributions of interest from a domain-of-attraction perspective, such as Burr distributions, are not specifically included in any packaged unit, but with minimal programming experience it is easy to construct such routines within any environment. On a related theme, stable distributions are of interest when considering sums of random variables with infinite variance, and routines related to stable distributions are included in Xtremes and the R package fbasics. The HYFRAN environment and the EXTREMES package also implement statistical distribution fitting for a wide range of distributions that are not necessarily associated with extreme value theory.

11 Software for the analysis of extreme events: The current state and future directions 97 All packaged units allow the user to perform maximum likelihood parameter estimation for the block maxima and POT methods in the iid case, though for the former the implementation of EXTREMES appears somewhat restrictive. Some units also allow for moment based estimation methods. Specifically, S þ FinMetricsðAÞ and EXTREMES have moment based procedures for POT models, and Sþ FinMetricsðBÞ, fextremes, HYFRAN and Xtremes contain moment based procedures for both models, though standard errors of moment based estimates are generally not given. Xtremes and EXTREMES also contain a number of other estimation methods. S þ FinMetricsðBÞ contains estimation procedures which combine moment based and maximum likelihood methods, but unfortunately all estimation procedures in this unit, including maximum likelihood, are somewhat limited because standard errors of estimates are not returned and there are no diagnostics available subsequent to model fitting. One consideration for POT modelling is the choice of threshold, and there are two common graphical tools to aid the user in this choice. The first is the mean excess plot discussed above. The second is to plot parameter estimates and confidence intervals at different thresholds for parameters that are expected to remain constant above thresholds at which the asymptotic approximation is valid. The packages ismev and evd implement these plots, while S þ FinMetricsðAÞ, EVIM and fextremes implement this plot for the shape parameter and a related plot which shows quantile estimates at each threshold choice. S þ FinMetricsðBÞ also implements this plot for the shape parameter but without pointwise confidence interval bands, so the function in S þ FinMetricsðAÞ is preferred. EXTREMES implements this plot for the parameters of the generalised Pareto distribution, also without pointwise confidence interval bands. Excluding S þ FinMetricsðBÞ, all units have the capacity to produce a number of diagnostic plots subsequent to fitting routines. The uniformly high quality of each unit with regard to diagnostic plotting perhaps reflects the collective belief of the authors that such diagnostics form an invaluable component of statistical inference. Figure 4 shows a screenshot of diagnostic plots for the block maxima method produced using ismev. The plots shown mirror those given in Figure 3.7 of Coles (2001). Figure 5 shows a screenshot of the diagnostic plotting menu and an associated plot for the POT method produced using EVIM. The details of the diagnostic plots illustrated in Figs. 4 and 5 can be found in the documentation for the respective units. The next stage of the analysis typically involves more formal testing procedures. EXTREMES contains functions for performing some goodness-of-fit tests; highlighting two from the doctoral thesis of Myriam Garrido. All units that produce standard errors can be used to construct Wald tests and confidence intervals for individual parameters. Most units produce the value of the optimised likelihood function or its logarithm, and allow models to be fitted under the restriction ¼ 0, and hence likelihood ratio tests can be performed for this hypothesis. The evd package is a little more general in that it allows models to be fitted under any specified restrictions, so more general tests can be performed directly. A related issue is one of profile likelihood plotting. The ismev and fextremes package can plot profile likelihoods for the shape parameter and a given quantile, yielding profile confidence intervals for both, and allowing related likelihood ratio tests to be performed indirectly. The evd package can also plot profile likelihoods, again under a little more generality. S þ FinMetricsðAÞ and EVIM can calculate profile confidence intervals for given quantiles. Excluding S þ FinMetricsðBÞ, all units

12 98 A. Stephenson, E. Gilleland Fig. 4 Screenshot of ismev. Four diagnostic plots are shown. The plots mirror those given in Figure 3.7 of Coles (2001) can produce maximum likelihood estimates and standard errors for extreme quantiles, reflecting the importance of such quantities in an extreme value analysis. A number of packaged units implement estimators based on domain-of-attraction conditions. Such estimators require a threshold or equivalently a given number of upper order statistics and are typically depicted using a plot of the threshold versus the estimator. If a particular estimate is required, the choice of threshold can be made on the basis of the shape of the plot or on more formal quantities such as the asymptotic mean squared error. S þ FinMetricsðAÞ and EVIM implement this procedure for the Hill estimator (2.2). fextremes additionally implements this procedure for the estimator of Pickands (1975) and for an estimator due to Dekkers et al. (1989). Xtremes and EXTREMES also implement Eq. 2.2 and a number of other estimators. Temporal dependence. The effect of dependence between observations was outlined in Section 2.1, where the importance of the extremal index was discussed. S þ FinMetricsðAÞ and EVIM allow estimation of the extremal index by the blocks method, whereas evd and extremes allow estimation by the runs method and calculation of the associated clusters. Additionally, extremes implements the estimator of Ferro and Segers (2003). fextremes implements blocks and runs estimates. Xtremes also has a number of clustering functions. Functionality for estimation of the extremal index and the associated identification

13 Software for the analysis of extreme events: The current state and future directions 99 Fig. 5 Screenshot of EVIM. The writing on the left is the diagnostic plot menu. The plot depicted on the right is a scatterplot of residuals of clusters does not imply functionality that allows the index to be taken into account in inference for the POT model. S þ FinMetricsðAÞ and EVIM can account for the index under the point process characterisation, and evd can additionally account for the index under the generalised Pareto characterisation. Non-stationarity. The R packages evd and ismev allow non-stationary models to be fitted. While the former only permits linear models for the location parameter, the latter permits generalised linear models for all three parameters. Coupled with R functions that calculate covariate basis matrices, ismev can be used to fit models of very general form, such as those incorporating regression splines. Estimation is by maximum likelihood, because moment-based estimators do not easily generalise to non-stationary models. Diagnostic plots which account for the non-stationarity are implemented in each package. Another publicly available R package VGAM y allows additional flexibility through the general framework of vector generalised additive models (Yee and Wild, 1996). This package (at the time of writing) is in the development stage, though it will be officially archived on CRAN in the near future (personal communication). Multivariate analysis. Multivariate extensions of the block maxima approach use multivariate extreme value distributions in the same manner as generalised extreme value distributions are used in the univariate case. Xtremes, evd and

14 100 A. Stephenson, E. Gilleland S þ FinMetricsðBÞ contain functions associated with bivariate componentwise maxima. The latter two contain functions for a large number of parametric models. The main difference between them is that S þ FinMetricsðBÞ focuses on copula models which have uniform margins, whereas in evd generalised extreme value marginals are used directly. The implication is that for an extreme value analysis in S þ FinMetricsðBÞ the user must have the ability to use the programming environment to transform the margins appropriately. More importantly, for inference it implies that the marginal and dependence features must be estimated separately. Unfortunately, S þ FinMetricsðBÞ does not return standard errors or optimised likelihood functions, and diagnostic plots are limited. The Xtremes environment focuses on three particular models, and can calculate moment estimates of the dependence parameter in each case. Bivariate extensions of the POT method are currently only implemented on a fairly superficial level. S þ FinMetricsðAÞ implements the point process approach, but only one particular parametric model is currently available. evd implements the censored likelihood approach for a number of different parametric models, but plotting diagnostics are not currently available. The Xtremes environment calculates estimates for three particular models using the method outlined in Reiss and Thomas (2001) Specialised units In financial applications, the Value-at-Risk of a financial time series of returns is essentially the level below which a future observation will drop with only a small probability. The small R package VaR implements commonly used estimation methods for this extreme quantile, including the extreme value POT approach. The generalised Pareto tail model is fitted to the negative returns and by subsequent inversion estimates of Value-at-Risk are obtained. VaR also gives confidence intervals and diagnostic plots, though it does not allow for dependence or nonstationarity. The larger package RandomFields includes a function which simulates from different types of max-stable processes. The simulations are based on Gaussian event profile functions (de Haan, 1984). The MATLAB toolbox WAFO contains an extensive set of routines for wave analysis. This includes routines for extreme value techniques such as POT modelling. The R package evdbayes concentrates only on extreme value methodology, but does so within a Bayesian inferential framework. Under this framework, the parameters ð; ; Þ are treated as random variables for which there is an associated trivariate distribution based on beliefs prior to the data being observed (i.e., the prior distribution). After the data are observed, the model is used to update this belief on the basis of the observations, resulting in a posterior distribution. The package computes Markov chain Monte Carlo (MCMC) simulations from posterior distributions based on the block maxima approach and the POT approach using the point process representation. A related order statistics approach is also implemented. It is necessary to simulate from the posterior distributions because, in all but the simplest cases, their evaluation involves an intractable multidimensional integral. Simple linear models of the form i ¼ 0 þ 1 z i can be implemented. A limited number of model diagnostics are available, and MCMC diagnostics are available through the R package coda y.

15 Software for the analysis of extreme events: The current state and future directions Future directions In this Section we discuss the future of the development of software for the analysis of extreme values. We propose the collaborative creation of a software initiative to develop a reliable and coherent set of tools. We feel that this will further increase the use of extreme value methods and will aid the transition of the methodology from a specialised branch of statistics to a conventional constituent of statistical theory. By providing a set of routines regarded as universally dependable it will also help to sharpen the focus of the practitioner, thereby promoting straightforward application. There are many issues to be considered when undertaking such a project, and all must be considered in terms of its objectives. In general terms, we suggest that the primary objective should focus on users who have little expertise in extreme value analysis, but whose applications dictate that such methods should be applied. The aim to facilitate appropriate, user-friendly analyses should lead to the development of a consistent collection of basic tools. However, this need not be the only objective focus. It is also possible to account for researchers who wish to construct their own methods. Such research-based objectives are far more ambitious because they require a system of tools that are both transparent and extensible, but we believe it is prudent to consider both application- and research-based users because the added complications that arise through considering both are outweighed by the benefit of increasing the potential user base. In the remainder of this section we discuss some of the main issues that arise when considering a project of this type. In Section 3.1 we discuss the programming language, the statistical environment and other related issues such as licensing. In Section 3.2 we discuss documentation and feedback, and in Sections 3.3 and 3.4 we, respectively, discuss design and development issues Language and environment No single statistical computing environment stands out as being uniquely popular among members of the extreme-value community. The environment used for the project must therefore be user friendly and undemanding with respect to standard tasks, given suitable documentation. On a more practical level, it must have good numerical and graphical capabilities, as these are of general fundamental importance. The availability of reliable mathematical and statistical algorithms is also a consideration. For example, likelihood inferential procedures require good optimisation routines, and reliable uniform random number generators are needed to create simulation routines for distributions related to extreme value theory. The user interface is another consideration; a GUI interface provides ease of use, but may lack the flexibility required for more sophisticated procedures or extensions desired by advanced users. One problem that arises in attracting a user community is that many users will inevitably not be familiar with the environment, and may not want to spend the required learning time simply to use the distributed software. This problem is difficult to resolve. Perhaps the best solution is to emphasise that learning a new environment has advantages beyond that of simply having the ability to use the distributed routines to perform an extreme value analysis. For example, experience of a GUI-based environment yields generic abilities that can be applied to envi-

16 102 A. Stephenson, E. Gilleland ronments of any analogous type. Similarly, experience of an environment based on a command-line interface often leads to the development of programming skills, and the ability to use statistical routines related to a number of diverse methodologies. Another solution is to target potential users directly. This can be done in a number of ways, such as promoting use of the software in conference sessions or through courses, or by providing a comprehensive tutorial. This can give potential users insight into the potential benefits, and motivate them to undertake the initial learning required to master the fundamental principals. The consideration of programming language will be partially dependent on the preferences of the main developers, but it will also involve a trade-off between speed of development and speed of application. The coding of methods in a highlevel interpreted language (e.g., R) can be done with speed and relative ease, but such methods may not run as quickly as when implemented in a lower level language (e.g., C). A hybrid approach can be taken whereby the speed-critical sections of high-level implementations can be reimplemented in a lower level language. We believe such an approach yields a good compromise, and allows researchers to build on the tools developed using the high-level framework. If the aim of the project is not geared towards extensibility, and is only focused on building a largely static collection of basic tools, then the avoidance of a high-level language becomes a more attractive option. There are many other elements to the choice of the statistical environment which comprise important components of the overall initiative. The documentation included in the packaged distribution, as discussed in Section 3.2, is among the most important. The statistical environment is also linked closely with the overall design strategy, the latter of which we discuss in more detail in Section 3.3. For example, the exact form of construction of the packaged software units, the details of their construction, and the interdependence between units may all depend to some extent on the environment. The nature of code construction, including concerns such as data structures, object classes, inheritance and the extent to which object oriented methodology is applied, is also influenced by the language and environment. The license which is given to the software upon public release is another important consideration. In particular, decisions must be made with regard to whether users can read, redistribute and modify the associated source code. These decisions impact the choice of environment because some environments may not allow for such practices. On the other hand, an open source license dictates that such practices must be allowed. The form of license will again depend on the aims of the project. We believe that if the project aims to create a flexible and extensible set of tools for both practitioners and researchers, then an open source environment will give the greatest opportunity for success. We support an open source environment in the context described above for a number of reasons, the primary reason being transparency. It is important that all users have the opportunity to read the code so that the accuracy and validity of the output can be readily and openly verified. Moreover, experienced users who are unsatisfied by the documentation descriptions can read the code to determine what precise methods are being used. Similarly, researchers may easily alter and extend the routines allowing the software to be used as a basis for implementing their own methods. This will allow researchers to promote their methods by making available routines that reproduce published results. Finally, the open source paradigm typically leads to beneficial software evolution as users, through interaction with

17 Software for the analysis of extreme events: The current state and future directions 103 core developers, adapt and improve the software and fix existing errors or bugs. The pace of software development is then seen to be rapid. We should emphasise that the majority of the advantages that we present in favour of the use of an open source environment primarily benefit researchers, and are of considerably less help for users who have little specialised knowledge of either programming languages or extreme-value methodology. If the aims of the project primarily concern such users, and a largely static compendium of standard tools is the objective focus, then a different model may prove more beneficial Documentation and feedback Comprehensive and accurate documentation is vital, and is necessary to make the experience of the user as trouble-free as possible. Documentation can be produced in many forms, the more traditional of which include help files incorporated into software units and manuals which can be read by the user in order to get an overview of the capabilities of the unit in use. Both the content and the construction of the documentation will depend on the environment. For example, the documentation for a primarily GUI based system will typically be less prescriptive than that for a command-line interface. Furthermore, many current environments specify certain design constructions on documentation files to achieve standardisation. For example, the developer may be required to write documentation files in a simple markup language in order that it can be converted into various formats. A related issue to documentation is that of automated testing procedures. More advanced use of documentation allows the developer to run specified segments of executable code in order to test that an error is not created, or to test that a particular result is produced for a given set of inputs or arguments. The ability of an environment to perform such tests and the extent to which this ability is suited to the goals of the project should be considered. More generally, other utilities may exist to aid development and make the process less time consuming, or to guard against other problems that users may face. Version control, installation considerations, documentation errors and platform dependencies are all examples of such issues. Comprehensive documentation provides guidance to users, and this is often reciprocated by user feedback. Feedback from users is useful in a number of ways, and must be actively encouraged. Most obviously, feedback may highlight problems in the software which can then be resolved by developers. In particular, the local conditions of every user cannot be addressed during development, so that problems occurring under particular system conditions can only be detected by users themselves. Creating software that is useful to a large number of users is important, and user feedback gives developers some guide as to the popularity of the software and to the extent and composition of the user community. Effective documentation is essential here, and users should be referred to the documentation whenever possible. On the other hand, user feedback may elicit cases where the documentation is insufficient, assisting the developer in making improvements there as well Design issues The code-level design may dictate that certain rules be followed in order to achieve consistency, to the advantage of both users and developers. At the package level, the form of each packaged unit and the relationships between each unit must be

18 104 A. Stephenson, E. Gilleland carefully considered. The way in which the developers collaborate so that they can work on the project simultaneously, and the manner in which the software is distributed, must be specified. For all such issues, the environment used will influence the decisions taken. The creation of coding guidelines should obviously be made with reference to the goals of the project. If the software is designed to be extensible, careful thought must be given to construct components that can be expanded or altered to suit the needs of the researcher; and the developer should be open minded to the possibility of such alteration and extension. Otherwise, the resulting code will not be useful as a building block for derivations that may arise in future research. The software introduced in Section 2 contains a number of useful routines, most written in high-level interpreted languages. It is therefore prudent to consider the extent to which these routines and associated code should be incorporated into any future initiative. Clearly there may be licensing issues to be addressed, so we refer here only to code that can legally be used. Reimplementation of different methods is to be avoided when there exists a universally accepted, reliable routine. We suggest then that existing routines be examined by the core developers and utilised whenever the developers regard those routines as reliable and robust. There is little point in reinventing the wheel, and reliable existing routines should aid development whenever possible. The construction of the packaged units will depend largely on the environment. Current environments typically have packaging protocols whereby certain file types and structures must be used. This is to ensure consistency and allow the automated identification of version numbers, dependencies, interpreted and compiled code, datasets, documentation and other such components. It may also allow for certain validation tests to be performed. The dependencies between packaged units must be taken into consideration. It may be preferable that certain units perform as individuals, and depend on no other unit. On the other hand, it may seem natural for one unit to require another unit if the former can be seen as an extension of the latter. Decisions of this type can only be made with reference to the routines contained within each unit, and may consist of a compromise between the complexity of the packaged units and the complexity of the interdependencies Development strategy Collaboration between the core developers is of primary importance, and a system must be in place whereby all developers can work simultaneously on the same project. For a small group of developers, communication is key, and good communication and design is required to ensure that the changes made by one developer do not adversely affect the code of another. The responsibility of a developer is not only to make routines that are reliable and robust, but also to ensure that they fit within the context of the overall project. At any given time, two versions of the software should be maintained: (a) a current version that can allow minor changes and be updated and distributed at frequent intervals; and (b), a development version for more substantial changes that can be accessed by the developers using some kind of management system. It is typically beneficial for the core developers to initially collaborate and derive, to some specified level of formality, a number of basic rules to ensure consistency with respect to the code and the programming interface, allowing greater ease of

19 Software for the analysis of extreme events: The current state and future directions 105 programming and ultimately usability; particularly for groups of similar functions, such as those based on likelihood inference. For example, in the object oriented methodology, data structures, object classes and related methods for those classes can be, at least for standard methods, designed in advance. The absence of such rules may lead to confusion for the user, who may feel lost among a number of unrelated software components; and additional difficulties for the developer, who may be forced to work with code written in vastly different styles. It may also lead to the creation of software units that cannot successfully interact, even when the developer considers such interaction to be beneficial. The core developers also have responsibility for introducing code written by other contributors, and to encourage the submission of such contributions. A system must be in place to manage such contributions and to assess their suitability. This may consist of checking that the submission satisfies certain advertised criteria, or that the claims made in supporting documentation are fulfilled. The core developers must decide on the level of quality control, and to what extent automated procedures are used. Collaboration will be the key to the success of any initiative. Common, welldefined aims must not give way to the needs and interests of an individual. Those involved must carefully decide upon the methods to be implemented, and must not focus on a particular methodology preferred by any given research group, else the resulting software will not be employed by the general extreme-value community. Proficiency with regard to both extreme-value methods and computer programming is not necessary for each individual in a collaborative effort. The developers must be willing to work together towards a common goal and to communicate effectively irrespective of geographical separation. Only under a collaborative framework such as this can a new software initiative achieve its objectives. 4. Development dilemmas Developing well-designed software is a difficult and time-consuming task where the benefits to the academician often are not sufficient enough to justify the sacrifice that can restrict progress in research and other areas of responsibility. On the other hand, there are compelling arguments that favour undertaking such a project. In this section we address some of these arguments, and then give some ideas of how relevant journals can facilitate development. Finally, we discuss issues related to funding, including the inevitable problem of assessing the popularity and usefulness of the final product. New statistical techniques and theory published in journals are not likely to be employed by scientists and other professionals if no tools are available to implement them. When good software does exist, however, the techniques can get wide use leading to greater numbers of citations for corresponding publications. See, for example, Donoho (2002) who states that one reason for his being a BHighly Cited Author (as defined by The Institute for Scientific Information) is that his research on wavelets was complemented by the development of free companion software which was promoted in his papers. Related to this, obtaining appreciative users throughout the academic community often leads to numerous new contacts both in statistics and other areas of research that can lead to new collaborations. Additionally, the developer will gain further knowledge of programming and other

20 106 A. Stephenson, E. Gilleland computational issues. A thorough understanding of the methodology involved is obtained, and in many cases the implementation of methods leads to additional insights that often inspire new ideas and observations. For core developers, a global view of the available software and methodology can be acquired, and the appreciation of the extreme value community can be anticipated. Academic recognition for software has not historically been comparable to that based on scientific publication, though this may be changing. It is important for journals to recognise that they can have a positive role in creating and supporting software initiatives such as that proposed in this paper. They can do this by publishing papers demonstrating or reviewing software when that software has a potentially significant role in furthering science. Furthermore, they can encourage authors to cite software used in their papers when appropriate. Funding is an essential factor when considering an initiative of this magnitude. It would likely be difficult to obtain funding for such a project based solely on developing a software package. Funding is more likely to be acquired if such a project is included as part of a grant proposal where the focus is not only on software development, but where the immediate benefit of the software can be justified. The software package extremes, for example, was funded primarily through the broader Weather and Climate Impacts Assessment Science (WCIAS) Initiative (now the WCIAS Program) at The National Center for Atmospheric Research (NCAR) in Colorado. Another possibility is to obtain numerous smaller grants from varying sources, where again successful requests would more likely emphasise the practical applications of the software. For example, it might be possible to obtain some funding from a finance agency, some from an agency interested in hydrology, and so on. The diverse areas of scientific interest to which extreme value theory can be applied are indicative of the potential for incorporating software development in cross-disciplinary funding. Whether a software package is included in a larger grant or is funded as the primary product through one or more funding sources, there will inevitably be a need to demonstrate that the package is useful to the intended community. This is a difficult task, and eliciting feedback from users is not always easy. Often the developer will only hear from users who have problems installing or running the software, and seldom from those who use it successfully. Obtaining statistics on the numbers of visits to the software_s website is one method that has been employed to argue for the software_s popularity. Such statistics can be misleading, however, because a visit does not mean that the visitor found the software useful; many visitors might not even be users. Another approach is to have users register their use of the package. This can also be difficult because many potential users will not want to register. Requiring registration can therefore deter potential users from using the software at all. 5. Discussion The software currently available for the analysis of extreme values goes some way towards allowing practitioners to perform complicated tasks relatively easily. On the other hand, there are different methods and techniques implemented in a number of different software units, and this can lead to confusion. The initiation of a project which aims to develop reliable software within a clear and consistent methodo-

A Comparative Software Review for Extreme Value Analysis

A Comparative Software Review for Extreme Value Analysis A Comparative Software Review for Extreme Value Analysis Eric Gilleland Research Applications Laboratory, National Center for Atmospheric Research, U.S.A. Mathieu Ribatet Institute of Mathematics, University

More information

A software review for extreme value analysis

A software review for extreme value analysis Extremes (2013) 16:103 119 DOI 10.1007/s10687-012-0155-0 A software review for extreme value analysis Eric Gilleland Mathieu Ribatet Alec G. Stephenson Received: 8 September 2011 / Revised: 12 June 2012

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

The Extremes Toolkit (extremes)

The Extremes Toolkit (extremes) The Extremes Toolkit (extremes) Weather and Climate Applications of Extreme Value Statistics Eric Gilleland National Center for Atmospheric Research (NCAR), Research Applications Laboratory (RAL) Richard

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction. 1.1 Introduction Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

More information

An Introduction to Extreme Value Theory

An Introduction to Extreme Value Theory An Introduction to Extreme Value Theory Petra Friederichs Meteorological Institute University of Bonn COPS Summer School, July/August, 2007 Applications of EVT Finance distribution of income has so called

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

An introduction to Value-at-Risk Learning Curve September 2003

An introduction to Value-at-Risk Learning Curve September 2003 An introduction to Value-at-Risk Learning Curve September 2003 Value-at-Risk The introduction of Value-at-Risk (VaR) as an accepted methodology for quantifying market risk is part of the evolution of risk

More information

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

Imputing Missing Data using SAS

Imputing Missing Data using SAS ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Model Calibration with Open Source Software: R and Friends. Dr. Heiko Frings Mathematical Risk Consulting

Model Calibration with Open Source Software: R and Friends. Dr. Heiko Frings Mathematical Risk Consulting Model with Open Source Software: and Friends Dr. Heiko Frings Mathematical isk Consulting Bern, 01.09.2011 Agenda in a Friends Model with & Friends o o o Overview First instance: An Extreme Value Example

More information

SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

More information

LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

More information

How To Understand The Theory Of Probability

How To Understand The Theory Of Probability Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS Sensitivity Analysis 3 We have already been introduced to sensitivity analysis in Chapter via the geometry of a simple example. We saw that the values of the decision variables and those of the slack and

More information

Operation Count; Numerical Linear Algebra

Operation Count; Numerical Linear Algebra 10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

More information

Contents. List of Figures. List of Tables. List of Examples. Preface to Volume IV

Contents. List of Figures. List of Tables. List of Examples. Preface to Volume IV Contents List of Figures List of Tables List of Examples Foreword Preface to Volume IV xiii xvi xxi xxv xxix IV.1 Value at Risk and Other Risk Metrics 1 IV.1.1 Introduction 1 IV.1.2 An Overview of Market

More information

Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization

Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization Jean- Damien Villiers ESSEC Business School Master of Sciences in Management Grande Ecole September 2013 1 Non Linear

More information

Using simulation to calculate the NPV of a project

Using simulation to calculate the NPV of a project Using simulation to calculate the NPV of a project Marius Holtan Onward Inc. 5/31/2002 Monte Carlo simulation is fast becoming the technology of choice for evaluating and analyzing assets, be it pure financial

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses Michael R. Powers[ ] Temple University and Tsinghua University Thomas Y. Powers Yale University [June 2009] Abstract We propose a

More information

Assessment Policy. 1 Introduction. 2 Background

Assessment Policy. 1 Introduction. 2 Background Assessment Policy 1 Introduction This document has been written by the National Foundation for Educational Research (NFER) to provide policy makers, researchers, teacher educators and practitioners with

More information

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document

More information

Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Structural Health Monitoring Tools (SHMTools)

Structural Health Monitoring Tools (SHMTools) Structural Health Monitoring Tools (SHMTools) Getting Started LANL/UCSD Engineering Institute LA-CC-14-046 c Copyright 2014, Los Alamos National Security, LLC All rights reserved. May 30, 2014 Contents

More information

Section 1.3 P 1 = 1 2. = 1 4 2 8. P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., = 1 2 4.

Section 1.3 P 1 = 1 2. = 1 4 2 8. P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., = 1 2 4. Difference Equations to Differential Equations Section. The Sum of a Sequence This section considers the problem of adding together the terms of a sequence. Of course, this is a problem only if more than

More information

Operational Risk Management: Added Value of Advanced Methodologies

Operational Risk Management: Added Value of Advanced Methodologies Operational Risk Management: Added Value of Advanced Methodologies Paris, September 2013 Bertrand HASSANI Head of Major Risks Management & Scenario Analysis Disclaimer: The opinions, ideas and approaches

More information

USING DIRECTED ONLINE TUTORIALS FOR TEACHING ENGINEERING STATISTICS

USING DIRECTED ONLINE TUTORIALS FOR TEACHING ENGINEERING STATISTICS USING DIRECTED ONLINE TUTORIALS FOR TEACHING ENGINEERING STATISTICS Richard J. School of Mathematics and Physics, The University of Queensland, Australia richard.wilson@uq.edu.au Since 2006, an internet

More information

STRATEGIC CAPACITY PLANNING USING STOCK CONTROL MODEL

STRATEGIC CAPACITY PLANNING USING STOCK CONTROL MODEL Session 6. Applications of Mathematical Methods to Logistics and Business Proceedings of the 9th International Conference Reliability and Statistics in Transportation and Communication (RelStat 09), 21

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

More information

A Programme Implementation of Several Inventory Control Algorithms

A Programme Implementation of Several Inventory Control Algorithms BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume, No Sofia 20 A Programme Implementation of Several Inventory Control Algorithms Vladimir Monov, Tasho Tashev Institute of Information

More information

INTERNATIONAL FRAMEWORK FOR ASSURANCE ENGAGEMENTS CONTENTS

INTERNATIONAL FRAMEWORK FOR ASSURANCE ENGAGEMENTS CONTENTS INTERNATIONAL FOR ASSURANCE ENGAGEMENTS (Effective for assurance reports issued on or after January 1, 2005) CONTENTS Paragraph Introduction... 1 6 Definition and Objective of an Assurance Engagement...

More information

An analysis of the dependence between crude oil price and ethanol price using bivariate extreme value copulas

An analysis of the dependence between crude oil price and ethanol price using bivariate extreme value copulas The Empirical Econometrics and Quantitative Economics Letters ISSN 2286 7147 EEQEL all rights reserved Volume 3, Number 3 (September 2014), pp. 13-23. An analysis of the dependence between crude oil price

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

More information

Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

Language Modeling. Chapter 1. 1.1 Introduction

Language Modeling. Chapter 1. 1.1 Introduction Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

BEPS ACTIONS 8-10. Revised Guidance on Profit Splits

BEPS ACTIONS 8-10. Revised Guidance on Profit Splits BEPS ACTIONS 8-10 Revised Guidance on Profit Splits DISCUSSION DRAFT ON THE REVISED GUIDANCE ON PROFIT SPLITS 4 July 2016 Public comments are invited on this discussion draft which deals with the clarification

More information

STATISTICAL DATA ANALYSIS COURSE VIA THE MATLAB WEB SERVER

STATISTICAL DATA ANALYSIS COURSE VIA THE MATLAB WEB SERVER STATISTICAL DATA ANALYSIS COURSE VIA THE MATLAB WEB SERVER Ale š LINKA Dept. of Textile Materials, TU Liberec Hálkova 6, 461 17 Liberec, Czech Republic e-mail: ales.linka@vslib.cz Petr VOLF Dept. of Applied

More information

PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE

PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE BY P.D. ENGLAND AND R.J. VERRALL ABSTRACT This paper extends the methods introduced in England & Verrall (00), and shows how predictive

More information

TEACHING OF STATISTICS IN KENYA. John W. Odhiambo University of Nairobi Nairobi

TEACHING OF STATISTICS IN KENYA. John W. Odhiambo University of Nairobi Nairobi TEACHING OF STATISTICS IN KENYA John W. Odhiambo University of Nairobi Nairobi In Kenya today statistics is taught at various levels in the education system to various degrees of coverage and sophistication.

More information

What is Data Analysis. Kerala School of MathematicsCourse in Statistics for Scientis. Introduction to Data Analysis. Steps in a Statistical Study

What is Data Analysis. Kerala School of MathematicsCourse in Statistics for Scientis. Introduction to Data Analysis. Steps in a Statistical Study Kerala School of Mathematics Course in Statistics for Scientists Introduction to Data Analysis T.Krishnan Strand Life Sciences, Bangalore What is Data Analysis Statistics is a body of methods how to use

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Regression Modeling Strategies

Regression Modeling Strategies Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

More information

Teaching Multivariate Analysis to Business-Major Students

Teaching Multivariate Analysis to Business-Major Students Teaching Multivariate Analysis to Business-Major Students Wing-Keung Wong and Teck-Wong Soon - Kent Ridge, Singapore 1. Introduction During the last two or three decades, multivariate statistical analysis

More information

The Dangers of Using Correlation to Measure Dependence

The Dangers of Using Correlation to Measure Dependence ALTERNATIVE INVESTMENT RESEARCH CENTRE WORKING PAPER SERIES Working Paper # 0010 The Dangers of Using Correlation to Measure Dependence Harry M. Kat Professor of Risk Management, Cass Business School,

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods

Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods Enrique Navarrete 1 Abstract: This paper surveys the main difficulties involved with the quantitative measurement

More information

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA ABSTRACT The decision of whether to use PLS instead of a covariance

More information

Learning Objectives for Selected Programs Offering Degrees at Two Academic Levels

Learning Objectives for Selected Programs Offering Degrees at Two Academic Levels Learning Objectives for Selected Programs Offering Degrees at Two Academic Levels Discipline Degree Learning Objectives Accounting 1. Students graduating with a in Accounting should be able to understand

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Five High Order Thinking Skills

Five High Order Thinking Skills Five High Order Introduction The high technology like computers and calculators has profoundly changed the world of mathematics education. It is not only what aspects of mathematics are essential for learning,

More information

LOGNORMAL MODEL FOR STOCK PRICES

LOGNORMAL MODEL FOR STOCK PRICES LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

SUBJECT TABLES METHODOLOGY

SUBJECT TABLES METHODOLOGY SUBJECT TABLES METHODOLOGY Version 0.3 Last updated: 28 March 2011 Copyright 2010 QS Intelligence Unit (a division of QS Quacquarelli Symonds Ltd) Contents Background... 3 Subject Disciplines Considered...

More information

PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II. Lecture Notes PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

More information

Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

More information

Fourth generation techniques (4GT)

Fourth generation techniques (4GT) Fourth generation techniques (4GT) The term fourth generation techniques (4GT) encompasses a broad array of software tools that have one thing in common. Each enables the software engineer to specify some

More information

Analysis of a Production/Inventory System with Multiple Retailers

Analysis of a Production/Inventory System with Multiple Retailers Analysis of a Production/Inventory System with Multiple Retailers Ann M. Noblesse 1, Robert N. Boute 1,2, Marc R. Lambrecht 1, Benny Van Houdt 3 1 Research Center for Operations Management, University

More information

Faculty of Science School of Mathematics and Statistics

Faculty of Science School of Mathematics and Statistics Faculty of Science School of Mathematics and Statistics MATH5836 Data Mining and its Business Applications Semester 1, 2014 CRICOS Provider No: 00098G MATH5836 Course Outline Information about the course

More information

Convex Hull Probability Depth: first results

Convex Hull Probability Depth: first results Conve Hull Probability Depth: first results Giovanni C. Porzio and Giancarlo Ragozini Abstract In this work, we present a new depth function, the conve hull probability depth, that is based on the conve

More information

Review of Fundamental Mathematics

Review of Fundamental Mathematics Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools

More information

Analysis of Financial Time Series

Analysis of Financial Time Series Analysis of Financial Time Series Analysis of Financial Time Series Financial Econometrics RUEY S. TSAY University of Chicago A Wiley-Interscience Publication JOHN WILEY & SONS, INC. This book is printed

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

Software Review: ITSM 2000 Professional Version 6.0.

Software Review: ITSM 2000 Professional Version 6.0. Lee, J. & Strazicich, M.C. (2002). Software Review: ITSM 2000 Professional Version 6.0. International Journal of Forecasting, 18(3): 455-459 (June 2002). Published by Elsevier (ISSN: 0169-2070). http://0-

More information

CHAPTER 5 Round-off errors

CHAPTER 5 Round-off errors CHAPTER 5 Round-off errors In the two previous chapters we have seen how numbers can be represented in the binary numeral system and how this is the basis for representing numbers in computers. Since any

More information