Lecture 10: Other Continuous Distributions and Probability Plots

Similar documents
Exploratory Data Analysis

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Lecture Notes 1. Brief Review of Basic Probability

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

Chapter 3 RANDOM VARIATE GENERATION

Package SHELF. February 5, 2016

Joint Exam 1/P Sample Exam 1

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Important Probability Distributions OPRE 6301

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

seven Statistical Analysis with Excel chapter OVERVIEW CHAPTER

( ) = 1 x. ! 2x = 2. The region where that joint density is positive is indicated with dotted lines in the graph below. y = x

Permutation Tests for Comparing Two Populations

Review of Random Variables

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Geostatistics Exploratory Analysis

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Exercise 1.12 (Pg )

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Simple linear regression

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Dongfeng Li. Autumn 2010

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

32. PROBABILITY P(A B)

Nonparametric adaptive age replacement with a one-cycle criterion

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

( ) is proportional to ( 10 + x)!2. Calculate the

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Probability density function : An arbitrary continuous random variable X is similarly described by its probability density function f x = f X

START Selected Topics in Assurance

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Random Variables. Chapter 2. Random Variables 1

Final Exam Practice Problem Answers

Simple Regression Theory II 2010 Samuel L. Baker

SUMAN DUVVURU STAT 567 PROJECT REPORT

STAT x 0 < x < 1

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Pure risk premiums under deductibles

3: Summary Statistics

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

How Far is too Far? Statistical Outlier Detection

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

How To Check For Differences In The One Way Anova


PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Choosing Probability Distributions in Simulation

Example: 1. You have observed that the number of hits to your web site follow a Poisson distribution at a rate of 2 per day.

Week 1. Exploratory Data Analysis

1 Prior Probability and Posterior Probability

Introduction to Probability

Standard Deviation Estimator

Gamma Distribution Fitting

Random Variate Generation (Part 3)

Modeling Individual Claims for Motor Third Party Liability of Insurance Companies in Albania

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

Determining distribution parameters from quantiles

WEEK #22: PDFs and CDFs, Measures of Center and Spread

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Pr(X = x) = f(x) = λe λx

Week 4: Standard Error and Confidence Intervals

Statistical Functions in Excel

DATA INTERPRETATION AND STATISTICS

Exponential Distribution

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Lecture 2. Summarizing the Sample

Exploratory data analysis (Chapter 2) Fall 2011

Lesson 20. Probability and Cumulative Distribution Functions

1 Sufficient statistics

Lecture 6: Discrete & Continuous Probability and Random Variables

Descriptive Statistics

Section 1.3 Exercises (Solutions)

Parametric Survival Models

Normality Testing in Excel

UNIVERSITY OF OSLO. The Poisson model is a common model for claim frequency.

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

PSTAT 120B Probability and Statistics

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Logistic Regression (a type of Generalized Linear Model)

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

1.1 Introduction, and Review of Probability Theory Random Variable, Range, Types of Random Variables CDF, PDF, Quantiles...

List of Examples. Examples 319

INSURANCE RISK THEORY (Problems)

Probability Calculator

1. Then f has a relative maximum at x = c if f(c) f(x) for all values of x in some

Inference on the parameters of the Weibull distribution using records

LOGIT AND PROBIT ANALYSIS

What is the purpose of this document? What is in the document? How do I send Feedback?

Stats on the TI 83 and TI 84 Calculator

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Interpretation of Somers D under four simple models

Homework set 4 - Solutions

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Transcription:

Lecture 10: Other Continuous Distributions and Probability Plots Devore: Section 4.4-4.6 Page 1

Gamma Distribution Gamma function is a natural extension of the factorial For any α > 0, Γ(α) = 0 x α 1 e x dx Properties: 1. If α > 1, Γ(α) = (α 1)Γ(α 1) 2. Γ(n) = (n 1)! for any n Z + 3. Γ ( 1 2) = π Page 2

The natural definition of a density based on the gamma function is f(x; α) = x α 1 e x Γ(α) 0 otherwise if x 0 A gamma density with parameters α > 0, β > 0 is 1 β f(x; α, β) = α Γ(α) xα 1 e x/β if x 0 0 otherwise α is a shape parameter, β is a scale parameter The case β = 1 is called the standard gamma distribution Page 3

Gamma pdf: graphical illustration Page 4

Gamma Distribution Parameters The mean and variance of a random variable X having the gamma distribution f(x; α, β) are E(X) = αβ and V (X) = αβ 2 Let X have a gamma distribution with parameters α and β Then P (X x) = F (x; α, β) = F (x/β; α) In the above, F (x; α) = x 0 y α 1 e y Γ(α) dy is an incomplete gamma function. It is defined for any x > 0. Page 5

Exponential Distribution as a Special Case of Gamma Distribution Assume that α = 1 and β = 1 λ. Then, f(x; λ) = λe λx if x 0 0 otherwise Its mean and variance are E(X) = 1 λ and V (X) = 1 λ 2 Note that µ = σ in this case Page 6

Exponential pdf: graphical illustration Page 7

Exponential cdf Exponential cdf can be easily obtained by integrating pdf, unlike the cdf of the general Gamma distribution The result is F (x; λ) = 1 e λx if x 0 0 otherwise Page 8

Example I Suppose the response time X at an on-line computer terminal has an exponential distribution with expected response time 5 sec E(X) = 1 λ = 5 and λ = 0.2 The probability that the response time is between 5 sec and 10 sec is P (5 X 10) = F (10; 0.2) F (5; 0.2) = 0.233 In R: pexp(10,0.2)-pexp(5,0.2) Page 9

Example II On average, 3 trucks per hour arrive to a given warehouse to be unloaded. What is the probability that the time between arrivals is less than 5 min? At least 45 min? Arrivals follow a Poisson process with parameter λ = 3; the respective exponential distribution parameter is also λ = 3 1/12 0 3 exp( 3x) dx = 1 exp( 1/4) = 0.221 3/4 3 exp( 3x) = exp( 9/4) = 0.105 Page 10

Weibull Distribution X is said to have a Weibull distribution if α x α 1 e (x/β)α β f(x; α, β) = α if x 0 0 otherwise Note that Weibull distribution is yet another generalization of exponential; indeed, if α = 1, Weibull pdf is exponential with λ = 1 β. The Weibull cdf is F (x; α, β) = 1 e (x/β)α if x 0 0 otherwise Page 11

The parameters α > 0 and β > 0 are referred to as the shape and scale parameters, respectively. Page 12

Weibull densities: graphical illustration Page 13

Alternative definition of Weibull distribution Weibull distribution is used in survival analysis, many processes in engineering, such as engine emissions, degradation data analysis etc Often, the Weibull distribution is also defined as bounded from below. Let X be the corrosion weight loss for a small square magnesium alloy plate. The plate has been immersed for 7 days in an inhibited 20% solution of MgBr 2. Suppose the min possible weight loss is γ = 3 and X = 3 has a Weibull distribution with α = 2 and β = 4. Page 14

F (x; 2, 4, 3) = 1 e [(x 3)/4]2 if x 3 0 if x < 3 Then, e.g. probability P (X > 3.5) = 1 F (3.5; 2, 4, 3) =.985 Page 15

Lognormal distribution If Y = log(x) is normal, X is said to have a lognormal distribution Its pdf is 1 2πσx e [log(x) µ]2 /(2σ 2 ) if x 0 0 if x < 0 Page 16

Lognormal densities: graphical illustration Page 17

Note that µ and σ 2 are NOT the mean and variance of the lognormal distribution. Those are E(X) = exp(µ + σ 2 /2) and V ar(x) = exp(2µ + σ 2 )(exp(σ 2 ) 1). Also note that the lognormal distribution is not symmetric but rather positively skewed. Lognormal distribution is commonly used to model various material properties. To find probabilities related to the lognormal distribution note that P (X x) = P (log(x) log(x)) ( = P Z log(x) µ ) ( ) log(x) µ = Φ σ σ Page 18

Beta Distribution Beta Distribution is used to model random quantities that are positive on a closed interval [A, B] only. It gives us much more flexibility than the uniform distribution. A RV X is said to have a beta distribution with parameters A, B, α > 0 and β > 0 if its pdf is f(x; α, β, A, B) = 1 Γ(α+β) B A Γ(α)Γ(β) if A x B 0 if x < 0 ( x A ) α 1 ( B x ) β 1 B A B A The mean and variance of X are µ = A + (B A) α σ 2 = α+β and (B A)2 αβ. (α+β) 2 (α+β+1) Page 19

Beta densities: graphical illustration Page 20

Example:Introduction Beta distribution is commonly used to model variation in the proportion of a quantity occurring in different samples (e.g. proportion of a 24-hour day that an individual is asleep or the proportion of a certain element in a chemical compound) Consider PERT - Program Evaluation and Review Technique. PERT is a method used to coordinate various activities that a large project may consist of. The standard assumption is that the time necessary to complete a particular activity once it has been started is Beta on an interval [A, B] with some α and β. Page 21

Example: Calculations Suppose that in constructing a single family house the time X (in days) needed to lay a foundation is Beta on [2, 5] with α = 2 and β = 3. All Beta-related calculations in R are done using the so-called standard beta disrtibution that is defined on [0, 1] To compute P (X 3) we need to transform our beta distribution to standard one: pbeta((3-2)/(5-2), shape1 = 2, shape2 = 3) Page 22

Sample Percentiles I How can we define sample percentiles...? First, let us order the n-sample observations from smallest to largest as x (1) x (2)... x (n). We know how to define the sample median for both odd and even n. For example, if n = 11, the median is x (6) ); this observation is assumed to lie in both halves of the data. We can treat the lower quartile (25% percentile ) in the same way. In general, we say that the ith smallest observation in the list is taken to be the 100(i 0.5) th sample percentile n Page 23

Sample percentiles II The intermediate percentages are obtained by linear interpolation. For example, if n = 10, one obtains 5% percentile and 15% percentile by assuming i = 1 and i = 2 in the above expression. Then, 10th percentile is halfway between the above two. We will not need those when constructing probability plots Page 24

Normal Probability Plot A plot of the n pairs [ 100(i 0.5) th z percentile, ith smallest n observation] is referred to as the normal probability plot. Generate 20 observations from the Gamma distribution with α = 2 and β = 1 The following normal probability plot illustrates just how much unlike normal distribution gamma distribution is: x < rgamma(20,2) qqnorm (x) Page 25

Example of Several Probability Plots for different samples from the Standard Normal Distribution If the observations are drawn from Z N(0, 1), a straight line through the origin at 45 deg is expected. This example helps learning how to avoid overinterpreting deviations from the straight line pattern R code: 1. par(mfrow=c(2,2)) 2. for (I in 1:4){ 3. x < rnorm (100) 4. qqnorm(x) 5. } Page 26

What to expect from a normal probability plot? If the sample observations are drawn from a normal distribution N(µ, σ 2 ), the points should fall close to a straight line with slope σ and intercept µ. The following are the common deviations from normality: Lighter tails than the normal ones but the distribution is symmetric Heavier tails than the normal ones but the distribution is symmetric The distribution is skewed Page 27

Case Illustration When the distribution is light-tailed, the extrema are not as extreme as what you d expect from a normal distribution. The result is an S-shaped pattern. Uniform distribution example: x < runif(100) qqnorm(x) When the distribution is a heavy-tailed one, the result is a different S-curve with the more extreme extrema Cauchy distribution example: x < rcauchy(100) qqnorm(x) If the underlying distribution is positively skewed, the smallest and the largest observations are larger then the normal would supply... Exponential distribution example: x < rexp(100) qqnorm(x) Page 28

Data example: forearms length dataset Forearms lengths are an example of linear measurement Can they be normally distributed? 1. forearms < as.vector(as.matrix(forearms)) 2. par(mfrow=c(1,2)) 3. boxplot(forearms, main= Boxplot ) 4. qqnorm (forearms) Page 29