Determining distribution parameters from quantiles

Similar documents
Maximum Likelihood Estimation

1 Sufficient statistics

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Nonparametric adaptive age replacement with a one-cycle criterion

Notes on the Negative Binomial Distribution

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Lecture 8: More Continuous Random Variables

Principle of Data Reduction

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

1 Prior Probability and Posterior Probability

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Gamma Distribution Fitting

6.2 Permutations continued

Chapter 3 RANDOM VARIATE GENERATION

HYPOTHESIS TESTING: POWER OF THE TEST

Section 5.1 Continuous Random Variables: Introduction

Lecture 8. Confidence intervals and the central limit theorem

Beta Distribution. Paul Johnson and Matt Beverlin June 10, 2013

A logistic approximation to the cumulative normal distribution

1.1 Introduction, and Review of Probability Theory Random Variable, Range, Types of Random Variables CDF, PDF, Quantiles...

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Fixed Point Theorems

Normal distribution. ) 2 /2σ. 2π σ

Bayesian Model Averaging Continual Reassessment Method BMA-CRM. Guosheng Yin and Ying Yuan. August 26, 2009

Package SHELF. February 5, 2016

INSURANCE RISK THEORY (Problems)

Parametric Survival Models

LOGIT AND PROBIT ANALYSIS

Normal Distribution. Definition A continuous random variable has a normal distribution if its probability density. f ( y ) = 1.

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

U = x x x 1 4. What are the equilibrium relative prices of the three goods? traders has members who are best off?

Probability Calculator

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

6.4 Normal Distribution

UNIVERSITY OF OSLO. The Poisson model is a common model for claim frequency.

Chapter 2: Binomial Methods and the Black-Scholes Formula

Moreover, under the risk neutral measure, it must be the case that (5) r t = µ t.

1 Portfolio mean and variance


The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

Basics of Statistical Machine Learning

A characterization of trace zero symmetric nonnegative 5x5 matrices

Permutation Tests for Comparing Two Populations

The Standard Normal distribution

Decision & Risk Analysis Lecture 6. Risk and Utility

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

LogNormal stock-price models in Exams MFE/3 and C/4

Portfolio Allocation and Asset Demand with Mean-Variance Preferences

MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

SOLUTIONS TO EXERCISES FOR. MATHEMATICS 205A Part 3. Spaces with special properties

Nonlinear Regression:

Equations, Inequalities & Partial Fractions

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents

Markov Chain Monte Carlo Simulation Made Simple

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

Chapter 4 Online Appendix: The Mathematics of Utility Functions

About the Gamma Function

Scalar Valued Functions of Several Variables; the Gradient Vector

The Normal Distribution. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

T ( a i x i ) = a i T (x i ).

START Selected Topics in Assurance

Lecture 6 Black-Scholes PDE

1 Review of Least Squares Solutions to Overdetermined Systems

Lecture 3: Linear methods for classification

5. Continuous Random Variables

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Review of Fundamental Mathematics

The Ergodic Theorem and randomness

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

Journal of Statistical Software

Pacific Journal of Mathematics

Practice problems for Homework 11 - Point Estimation

2.6. Probability. In general the probability density of a random variable satisfies two conditions:

88 CHAPTER 2. VECTOR FUNCTIONS. . First, we need to compute T (s). a By definition, r (s) T (s) = 1 a sin s a. sin s a, cos s a

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

The Exponential Distribution

Dongfeng Li. Autumn 2010

n k=1 k=0 1/k! = e. Example 6.4. The series 1/k 2 converges in R. Indeed, if s n = n then k=1 1/k, then s 2n s n = 1 n

Chapter 4: Statistical Hypothesis Testing

Operational Risk Modeling Analytics

Continuity of the Perron Root

Gaussian Conjugate Prior Cheat Sheet

Unit 18: Accelerated Test Models

Sales forecasting # 2

This asserts two sets are equal iff they have the same elements, that is, a set is determined by its elements.

STAT 830 Convergence in Distribution

CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12

4. Continuous Random Variables, the Pareto and Normal Distributions

Largest Fixed-Aspect, Axis-Aligned Rectangle

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

2WB05 Simulation Lecture 8: Generating random variables

Generating Random Variates

Standard Deviation Estimator

Geostatistics Exploratory Analysis

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

E3: PROBABILITY AND STATISTICS lecture notes

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Hedging Options In The Incomplete Market With Stochastic Volatility. Rituparna Sen Sunday, Nov 15

Transcription:

Determining distribution parameters from quantiles John D. Cook Department of Biostatistics The University of Texas M. D. Anderson Cancer Center P. O. Box 301402 Unit 1409 Houston, TX 77230-1402 USA cook@mderson.org January 27, 2010 Abstract Bayesian statistics often requires eliciting prior probabilities from subject matter experts who are unfamiliar with statistics. While most people an intuitive understing of the mean of a probability distribution, fewer people underst variance as well, particularly in the context of asymmetric distributions. Prior beliefs may be more accurately captured by asking experts for quantiles rather than for means variances. This note will explain how to solve for parameters so that common distributions satisfy two quantile conditions. We present algorithms for computing these parameters point to corresponding software. The distributions discussed are normal, log normal, Cauchy, Weibull, gamma, inverse gamma. The method given for the normal Cauchy distributions applies more generally to any location-scale family. 1 Problem statement Let X be a rom variable from a two-parameter family. Given probabilities p 1 p 2 we elicit values x 2 such that P (X < ) = p 1 P (X < x 2 ) = p 2. For example, we may ask for the 10th 90th percentiles. For consistency we require ( x 2 )(p 1 p 2 ) > 0. 1

This is the only restriction. There is no mathematical requirement for p 1 p 2 to be widely separated, though in practice they usually will be. 2 Location-scale families Suppose we want to find the mean stard deviation of a normal distribution that has specified quantiles. The derivation below shows how to compute the distribution parameters. The only property of the normal distribution we use is that it is a location-scale family. The same derivation shows how to find the location scale of any location-scale distribution. We want to find µ σ such that a normal rom variable with mean (location) µ stard deviation (scale) σ satisfies where < x 2 p 1 < p 2. P (X < ) = p 1 P (X < x 2 ) = p 2 The rom variable X has the same distribution as σz + µ where Z is a normal rom variable with mean 0 stard deviation 1. Let Φ be the CDF of Z, i.e. Φ(x) = P (Z < x). Then the equations are equivalent to so P (X < x i ) = p i P (σz + µ < x i ) = p i ( P Z < x ) ( ) i µ xi µ = Φ = p i. σ σ This yields linear equations for the parameters Φ 1 (p i )σ + µ = x i with the solution σ = x 2 Φ 1 (p 2 ) Φ 1 (p 1 ) µ = Φ 1 (p 2 ) x 2 Φ 1 (p 1 ) Φ 1 (p 2 ) Φ 1. (p 1 ) A log-normal distribution can easily be determined by two quantiles by reducing the problem to that of finding the parameters for a normal distribution. 2

If X log-normal(µ, σ), then Y = log X normal(µ, σ). To find µ σ so that P (X < x i ) = p i, solve for parameters so that P (Y < log x i ) = p i. Note that the argument above could be used to find the location scale parameters for any location-scale distribution. For example, the procedure could be used to find parameters of a Cauchy distribution. If Z is the distribution family representative with location 0 scale 1 F (x) is its CDF, then the scale parameter the location parameter σ = guarantee that X = σz + µ satisfies x 2 F 1 (p 2 ) F 1 (p 1 ) µ = F 1 (p 2 ) x 2 F 1 (p 1 ) F 1 (p 2 ) F 1 (p 1 ) F ( ) = p 1 F (x 2 ) = p 2. 3 Weibull distribution Next we consider the problem of determining the parameters of a Weibull distribution from two specified quantiles. The Weibull distribution with shape γ scale β has CDF ( ( ) γ ) x F (x; γ, β) = 1 exp. β The CDF can be inverted easily: F 1 (p; γ, β) = β( log(1 p)) 1/γ. If then Thus for every γ > 0 F ( ; γ, β) = p 1 F ( /β; γ, 1) = p 1. β = β(γ) = F 1 (p 1 ; γ, 1) = ( log(1 p 1 )) 1/γ 3

satisfies the first quantile equation F ( ; γ, β) = p 1. Next we show how to select γ to also satisfy the second quantile equation. If F (x i ; γ, β(γ)) = p i for i = 1, 2 then x i = F 1 (p i ; γ, 1) It follows that x 2 = F 1 (p 2 ; γ, 1) F 1 (p 1 ; γ, 1) = ( ) 1/γ log(1 p2 ). log(1 p 1 ) γ = log( log(1 p 2)) log( log(1 p 1 )). log(x 2 ) log( ) 4 Gamma distribution We can determine the parameters for a gamma distribution in a manner similar to that used for the Weibull distribution. However, the CDF inverse CDF of a gamma distribution do not have an elementary closed form so the proof is less direct. Let F (x; α, β) be the CDF of a gamma distribution with shape α scale β. That is, 1 x F (x; α, β) = Γ(α)β α t α 1 exp( t/β) dt. We will show there is a unique solution to the equations 0 F ( ; α, β) = p 1 F (x 2 ; α, β) = p 2 provided 0 < < x 2 0 < p 1 < p 2 < 1. Note that the gamma is a scale family so Thus for any α > 0, is the unique β satisfying F (x; α, β) = F (x/β; α, 1). β = F 1 (p 1 ; α, 1) F ( ; α, β) = p 1. This says that set of parameters satisfying one quantile constraint is a curve in the first quadrant. This curve implicitly defines β as a function of α. The question now is whether the curve corresponding to two different constraints must intersect. 4

Assume > 0 0 < p 1 < p 2 < 1 are given. We consider the range of x 2 values such that the two quantile constraints can be satisfied. We will show that this range is (, ). If we can find α > 0 such that β = F 1 (p 1 ; α, 1) = x 2 then (α, β) satisfies both constraints F (x i ; α, β) = p i. obtain equality above when x 2 =. The question now is reduced to determining the range of First we show F 1 (p 1 ; α, 1). lim α F 1 (p 1 ; α, 1) = 1. For every α, we can As α, the CDF of a gamma(α, 1) rom variable approaches the CDF of a normal(α, α) rom variable. Thus for large α F 1 (p 2 ; α, 1) αφ 1 F 1 (p 1 ; α, 1) (p 2 ) + α αφ 1 (p 1 ) + α 1. Next, we show lim α 0 F 1 (p 1 ; α, 1) =. For small values of α, This says that as α 0, x 0 1/Γ(α) α t α 1 exp( t) dt xα α. F (x; α, 1) x α so F 1 (p i ; α, 1) p 1/α i. Therefore since p 2 > p 1 we have F 1 (p 1 ; α, 1) 5 ( p2 p 1 ) 1/α

as α 0. This says that for fixed 0 < p 1 < p 2 < 1 0 < < x 2, we can always find a value of α such that F 1 (p 1 ; α, 1) = x 2 (1) hence we can satisfy both quantile conditions. To compute α we can solve the equation (1), say using a bisection method. To prove that that the solution produced above is unique it suffices show that that f(α) = F 1 (p 1 ; α, 1) is monotone decreasing. We have shown that f(α) is monotone decreasing for sufficiently small sufficiently large values of α via asymptotic arguments. A complete proof would show that f(α) is monotone for all α. Note that we could solve for the parameters of an inverse gamma rom variable X by reducing the problem to finding parameters for the gamma rom variable 1/X. We have P (X < x i ) = p i if only if P (Y < 1/x i ) = 1 p i. 5 Software ParameterSolver is a Windows application that provides a convenient user interface for determining distribution parameters to satisfy two quantile conditions. The software also determines distribution parameters given a mean variance. The software is available from the M. D. Anderson Cancer Center Biostatistics software download site. http://biostatistics.mderson.org/softwaredownload/ This software does not implement the algorithms developed here. Instead it uses optimization to find the parameters that minimize (F ( ) p 1 ) 2 + (F (x 2 ) p 2 ) 2. I did not realize at the time I worked on the software that the quantile constraints could be satisfied exactly or that they could so easily be calculated directly for several distributions. The fact that the software provided parameters that exactly satisfy the quantile constraints prompted this more theoretical note. The software will solve for beta distribution parameters satisfying two quantile conditions. It appears that these conditions can be satisfied exactly, though I have not proven this. 6