Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)

Similar documents
Hypothesis testing. Null and alternative hypotheses

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Normal Distribution.

Maximum Likelihood Estimators.

Sampling Distribution And Central Limit Theorem

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

I. Chi-squared Distributions

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)


1. C. The formula for the confidence interval for a population mean is: x t, which was

Chapter 14 Nonparametric Statistics

Overview of some probability distributions.

Properties of MLE: consistency, asymptotic normality. Fisher information.

One-sample test of proportions

5: Introduction to Estimation

Math C067 Sampling Distributions

1 Computing the Standard Deviation of Sample Means

PSYCHOLOGICAL STATISTICS

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Chapter 7 Methods of Finding Estimators

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Confidence intervals and hypothesis tests

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

Measures of Spread and Boxplots Discrete Math, Section 9.4

Lesson 17 Pearson s Correlation Coefficient

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Incremental calculation of weighted mean and variance

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Statistical inference: example 1. Inferential Statistics

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Practice Problems for Test 3

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Output Analysis (2, Chapters 10 &11 Law)

Lesson 15 ANOVA (analysis of variance)

Determining the sample size

THE TWO-VARIABLE LINEAR REGRESSION MODEL

Exploratory Data Analysis

Chapter 7: Confidence Interval and Sample Size

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

Now here is the important step

TO: Users of the ACTEX Review Seminar on DVD for SOA Exam MLC

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

Descriptive Statistics

1 Correlation and Regression Analysis

Convexity, Inequalities, and Norms

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Quadrat Sampling in Population Ecology

LECTURE 13: Cross-validation

A probabilistic proof of a binomial identity

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

Central Limit Theorem and Its Applications to Baseball

Soving Recurrence Relations

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

A Recursive Formula for Moments of a Binomial Distribution

The Stable Marriage Problem

Heat (or Diffusion) equation in 1D*

3. Greatest Common Divisor - Least Common Multiple

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Unbiased Estimation. Topic Introduction

Factoring x n 1: cyclotomic and Aurifeuillian polynomials Paul Garrett <garrett@math.umn.edu>

STATISTICAL METHODS FOR BUSINESS

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

Solving Logarithms and Exponential Equations

1 Review of Probability

3. If x and y are real numbers, what is the simplified radical form

Chapter XIV: Fundamentals of Probability and Statistics *

Confidence Intervals for One Mean

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

How To Calculate A Radom Umber From A Probability Fuctio

Inverse Gaussian Distribution

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION

Confidence Intervals

CS103X: Discrete Structures Homework 4 Solutions

A modified Kolmogorov-Smirnov test for normality

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Chapter 5: Inner Product Spaces

THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK

λ λ(1+δ) e λ 2πe λ(1+δ) = eλδ (1 + δ) λ(1+δ) 1/2 2πλ p(x) = e (x λ)2 /(2λ) 2πλ

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

OMG! Excessive Texting Tied to Risky Teen Behaviors

1. MATHEMATICAL INDUCTION

Lecture 4: Cheeger s Inequality

1 The Gaussian channel

Tradigms of Astundithi and Toyota

S. Tanny MAT 344 Spring be the minimum number of moves required.

Transcription:

6 Parametric (theoretical) probability distributios. (Wilks, Ch. 4) Note: parametric: assume a theoretical distributio (e.g., Gauss) No-parametric: o assumptio made about the distributio Advatages of assumig a parametric probability distributio: Compactio: just a few parameters Smoothig, iterpolatio, etrapolatio + s Parameter: e.g.: µ,σ populatio mea ad stadard deviatio Statistic: estimatio of parameter from sample:,s sample mea ad stadard deviatio Discrete distributios: (e.g., yes/o; above ormal, ormal, below ormal) Biomial: E = (yes or success); E = 0 (o, fail). These are MECE. P(E ) = p P(E ) = p. Assume N idepedet trials. How may yes we ca obtai i N idepedet trials? = (0,,...N, N ), N+ possibilities. Note that is like a dummy variable. P( X = ) = N p ( p) N N, remember that = N!!(N )!, 0! = Berouilli is the biomial distributio with a sigle trial, N=: = (0,), P( X = 0) = p, P( X = ) = p Geometric: Number of trials util et success: i.e., - fails followed by a success. P( X = ) = ( p) p =,,...

7 Poisso: Approimatio of biomial for small p ad large N. Evets occur radomly at a costat rate (per N trials) µ = Np. The rate per trial p is low so that evets i the same period (N trials) are approimately idepedet. Eample: assume the probability of a torado i a certai couty o a give day is p=/00. The the average rate per seaso is: µ = 90 */ 00 = 0.9. P( X = ) = µ e µ! = 0,,... Questio: What is the probability of havig 0 torados, or torados i a seaso? Epected Value: probability weighted mea Eample: Epected mea: µ = E( X ) =.P( X = ) Eample: Biomial distrib. mea µ = E( X ) = Properties of epected value: N =0 N p ( p) = Np E( f ( X )) = f ().P( X = ); E(a. f ( X ) + b.g( X )) = a.e( f ( X )) + b.e(g( X )) Eample Variace Var( X ) = E(( X µ) ) = ( µ) P( X = ) = = P( X = ) µ P( X = ) + µ P( X = ) = E( X ) µ E.g.: Biomial: Var( X ) = Np( p) ( p) Geometric: Var( X ) = p Poisso: Var( X ) = Np = µ

8 Cotiuous probability distributios f() Probability desity fuctio f(): PDF +δ f ()d f ()d = P( X < δ ) δ + δ δ Cumulative prob distributio fuctio (CDF) F() F(a) = P( X a) = a f ()d easy to ivert: a(f) = F (P) p a Note o the use of CDF for empirical fuctioal relatioships: The cumulative distributio fuctio (CDF) ca be easily iverted. This allows obtaiig fuctioal relatioships betwee variables with differet PDF s. For eample, if we wat to create radom umbers with a arbitrary PDF p(), we obtai first the correspodig CDF F() = p(u)du (which varies betwee 0 ad ). We the use computer-geerated radom umbers r i uiformly distributed betwee 0 ad ad ivert the CDF to obtai create radom umbers i with the desired PDF: F( i ) = r i. F() r i 0 i

9 As a eample, cosider a empirical parameterizatio of the cloud cover (CC) as a fuctio of the relative humidity (RH) at a grid poit i a model. To get a realistic relatioship we (scatter) plot (for differet latitudes ad altitudes) the observed joit distributio of RH ad CC, for eample: CC 0 0 00 RH We ca get a empirical relatioship betwee RH ad CC by computig their CDF(RH), CDF(CC). The we associate to each value RH i the correspodig value CC i such that they have the same F: CDF(RH i )=CDF(CC i ): r i F() F(RH) F(CC) 0 0 RH i CC i RH=00 CC=

0 Epected value: Probability weighted average E(g( X )) = as for discrete distributios: E = g( i ) p( i ). For eample, the mea µ = E( X ) = f ()d, ad the variace Var( X ) = E(( X µ) ) = ( µ) f ()d = ( µ + µ ) f ()d = = f ()d µ f ()d + µ f ()d = E( X ) µ i g() f ()d. Same A ecellet resource is the NIST website www.itl.ist.gov/div898/hadbook/ide.htm ad i particular the gallery of distributios www.itl.ist.gov/div898/hadbook/eda/sectio3/eda366.htm The dataplot software seems very ice, but I have ot tried it: www.itl.ist.gov/div898/software/dataplot/

Gaussia or ormal prob. distributio: f () = Stadard form (z is dimesioless) z = µ σ s σ π e ( µ) σ ; f (z) = e z / Cetral limit theorem: the average of a set of idepedet observatios will have a Gaussia distributio for a large eough average. Well behaved atmospheric variables (e.g., T): eve a oe-day average is approimately Gaussia. Multimodal or skewed variables (e.g., pp): require loger averages to look Gaussia. π daily averages mothly averages mothly average pp How to use a table of Gaussia probabilities: Estimate µ adσ from a sample ad covert to a stadard variable: z = µ σ s gives F(z) = P(Z z). The area is P(z Z z ) = F(z ) F(z ) pp. The table z z z Eample: What is the probability of a Jauary average temperature of T 0C if µ T Ja = 4 o C ad σ s = o C? z = T 4 = 4 = (- stadard deviatios) F( ) = 0.03. Note that P( Z σ ) = * F( ) = *0.03 0.05 = 5%.03 What are the Gaussia terciles for the temperature distributio? F(z)=0.666 z=0.43 T = µ ± 0.43σ = 4 o C+/-0.86 o C - 0.03

Other probability distributios: Gamma: for data>0, positively skewed, such as precipitatio. f () = ( / β)α e / β βγ(α) f() β : scale parameter α : shape parameter.5 α =.5 α = 4 α = 4 6 8 0 Γ(α) (α )Γ(α ): gamma fuctio (For itegers Γ() = ( )!) µ = E( X ) = f ()d = αβ, 0 σ = E( X ) E( X ) = αβ For α = the gamma distributio becomes the epoetial distributio: f () = e / β if > 0, 0 otherwise. The cumulative distributio fuctio is β F() = e / β if > 0, 0 otherwise. Beta: For data betwee 0 ad (e.g., RH, cloud cover, probabilities) f () = Γ(p + q) Γ( p)γ(q) p ( ) q, 0, p,q > 0 This is a very fleible fuctio, takig o may differet shapes depedig o its two parameters p ad q. Figure 4..

3 µ = p / (p + q) ; σ = (p + q) (p + q + ) From these equatios, the momet estimators (see parameter estimators) ca be derived: ˆp = ( ) ; ˆq = ˆp( ) s For the special case p=q=, the PDF is f()=, the uiform distributio (betwee 0 ad ). pq Distributios arisig from the ormal pdf: (used for hypothesis testig) χ chi-square: If Z,...,Z are stadard ormal idepedet variables, the X = Z +...+ Z has a χ distributio with degrees of freedom. Table A gives the area α Area = P( X χα, ) = α X = χ α, χ Gamma(β =,α = / ) ; X = E( X ) = ; Var( X ) = E( X X ) = For eample, with =0, a=5%, from table A, we ca fid =8.307. This meas that if we have a data set which is the sum of stadard ormal idepedet variables (e.g., temperature mius the mea divided by the stadard deviatio) the epected value is =0, ad the probability of fidig a value larger tha 8.3 is less tha 5%.

4 The epoetial fuctio is also the same as the chi-square for degrees of freedom. For this reaso it is appropriate for wid speed s = (u + v ). A importat applicatio of the chi-square is to test goodess of fit: If you have a histogram with bis, ad a umber of observatios O i ad epected umber of observatios E i (e.g., from a parametric distributio) i each bi, the the goodess of fit of the pdf to the data ca be estimated usig a chi-square test: X = i= (O i E i ) with - degrees of freedom E i The ull hypothesis (that it is a good fit) is rejected at a 5% level of sigificace if X > χ (0.05, ). t-distributio: If Z is ormal, ad χ = Z +...+ Z the the radom variable T = Z has a t-distributio with -degrees of freedom (Table A3). If χ / >5, it is very close to a ormal: ormal t, = t, =5 For eample: Normal distributio: (Table A) Φ(.96)=0.975, ie. T-distributio: (Table A3) a=0.05 With =0, t a, =.8 =0, t a, =.086 =, t a, =.96 0.96.05

5 Parameter estimatio (fittig a distributio to observatios) ) Momets fittig: compute the first two momets from a sample ad the use distributio: = i ( ; s = i ), ad the use these i= i= values i the Gaussia or other distributio. For eample, for the Gaussia distributio, simply use ˆµ = ; ˆσ = s, ad for the gamma distributio, = ˆα ˆβ;s = ˆα ˆβ ˆβ = s ; ˆα = s The NIST web site gives the mea ad the stadard deviatio relatioship to the parameters of the probability distributio. http://www.itl.ist.gov/div898/hadbook/eda/sectio3/eda366.htm ) Maimum likelihood method: maimize the probability of the distributio fittig all the observatios{ i }. The probability of havig obtaied the observatios is the product of the probabilities for each observatio (for a Gaussia distributio), i.e. ( i µ) i= σ I(µ,σ ) = f ( i ) = e, i= i= σ (π ) or maimizig its logarithm: ( L(µ,σ ) = l(i) = lσ l (π ) i µ). σ i= The L L = 0, = 0 gives the maimum likelihood parameters µ,σ. Note µ σ that for the Gaussia distributio, this gives ˆµ = i (same as with the mometum fittig), but that ˆσ = ( i ). The most likely value for i= the stadard deviatio is ot the ubiased estimator s. i=

6 Note: Likelihood is the probability distributio of the truth give a measuremet. It is equal to the probability distributio of the measuremet give the truth (Edwards, 984). Goodess of fit Methods to test goodess of fit: a) Plot a PDF over the histogram ad check how well it fits, (Fig 4.4) or b) check how well scatter plots of quatiles from the histogram vs quatiles from the PDF fall oto the diagoal lie (Fig 4.5) ) A q-q plot is a plot of the quatiles of the first data set agaist the quatiles of the secod data set. By a quatile, we mea the fractio (or percet) of poits below the give value. That is, the 0.3 (or 30%) quatile is the poit at which 30% percet of the data fall below ad 70% fall above that value. ) Both aes are i uits of their respective data sets. That is, the actual quatile level is ot plotted. If the data sets have the same size, the q-q plot is essetially a plot of sorted data set agaist sorted data set. c) Use the chi-square test (see above): X = The fit is cosidered good (at a 5% level of sigificace) if X < χ (0.05, ) i= (O i E i ) E i Etreme evets: Gumbel distributio

7 Eamples: coldest temperature i Jauary, maimum daily precipitatio i a summer, maimum river flow i the sprig. Note that there are two time scales: a short scale (e.g., day) ad a log scale: a umber of years. Cosider ow the problem of obtaiig the maimum (e.g., warmest temperature) etreme probability distributio: CDF: F() = ep ep ξ : this ca be derived from the β epoetial distributio (vo Storch ad Zwiers, p49). The PDF ca be obtaied from the CDF: PDF: f () = ξ ep ep β β ξ β Parameter estimatio for the maimum distributio ˆβ = s 6 π ; ˆξ = γ ˆβ, γ = 0.577..., Euler costat. Note that = ˆξ + γ ˆβ idicates that for the maimum Gumbel distributio, the mea is to the right of ˆξ, which is the mode (value for which the pdf is maimum, or most popular value, check the pdf figure). Therefore, for the miimum distributio, sice the mea is to the left of the mode (check the pdf figure), the parameters are: ˆβ = s 6 π ; ˆξ = + γ ˆβ (i.e., = ˆξ γ ˆβ), γ = 0.577... Maimum Gumbel Miimum Gumbel ξ ξ

8 If we take ξ = 0, ad β = we get the stadard Gumbel maimum pdf, with the correspodig stadard Gumbel maimum CDF (i parethesis): PDF: f () = ξ ep ep β β ξ β (= e e ) CDF: ξ F() = ep ep β (= e e ) The stadard maimum PDF ad CDF are plotted below:

9 Retur year: For Maimum Gumbel distributio How may years we eed wait to see >=X happe agai? P(>=.3) =-CDF(=.3) =-0.9=0. retur year= /(-CDF) =0YEARS The horizotal lie idicates CDF=0.9, correspodig to a retur time of 0 years, ad the vertical lie the 0-year retur value, i.e., the value of the stadardized variable (e.g., temperature) such that o the average we have to wait 0 years util we see such a value (or larger) agai. A CDF=0.99 would correspod to a 00-year retur value. Oce we choose the CDF for a retur value, let s say F() = ep ep ξ β = 0.99 for 00 year retur, we ca ivert it to obtai the 00-year retur value itself: 00 years = F () = ξ β l( l F 00 years ) This ca also be obtaied graphically from the stadard Gumbel CDF.

0 If istead of lookig for the maimum etreme evet we are lookig for the miimum (e.g., the coldest) etreme evet, we have to reverse the ormalized ξ β to ξ β. The Gumbel miimum distributios become (with the stadard versio i parethesis) PDF: f () = ξ ep ep β β + ξ β (= e e + ) The itegral of e e + is e e + cost = e e so that ξ CDF: F() = ep ep (= e e ) β The PDF ad CDF plots follow:

Retur year: For Miimum Gumbel distributio How may years we eed wait to see <=X happe agai? P(<= -.3) =CDF(=-.3)=0. retur year= /CDF =0YEARS The retur time of 0 years (marked i blue) is the etreme value that has a cumulative probability of 0. (for the miimum) or 0.9 (for the maimum).

Eample of PDF ad CDF for Gumbel (maimum) distributio: β =, ξ =.79= l6. Note: The retur time is computed from the CDF. The CDF probability 0.5 (which meas that o the average it happes every other year) correspods to a retur time of years, 0.9 to 0 years, etc. The PDF is oly used to compare with a histogram..9 Retur time 0 00 000.6.4. 0 ξ 4 6 8 0 Multivariate Gaussia distributios For multivariate problems, Gaussia distributios are i practice used almost eclusively For oe (scalar) variable, the Gaussia distributio ca be writte as f () = σ π e For two variables ( µ)σ ( µ) or f (z) = π e ( z)( z) f (, ) = ( π ) σ ' ' ' ' σ e T µ µ Σ µ µ

3 σ where Σ = ' ' is the covariace matri. ' ' σ With two stadardized variables f (z,z ) = ( π ) ρ ρ e z z T R z z where ρ R = ρ is the correlatio matri. For k variables, we defie a vector =... k ad f () = ( π ) k Σ For stadardized variables, e µ T Σ µ. f (z) = ( π ) k R e z T R z where σ... ' k '... ρ k Σ =......... ad R =......... ' k '... σ k ρ k... are the covariace ad correlatio matrices respectively. See figure 4./4.5 of Wilks showig a bivariate distributio.