# Part 2: One-parameter models

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Part 2: One-parameter models

2 Bernoilli/binomial models Return to iid Y 1,...,Y n Bin(1, θ). The sampling model/likelihood is p(y 1,...,y n θ) =θ P y i (1 θ) n P y i When combined with a prior p(θ), Bayes rule gives the posterior p(θ y 1,...,y n )= θp y i (1 θ) n P yi p(θ) p(y 1,...,y n )

3 Posterior odds ratio If we compare the relative probabilities of any two -values, say and, we see that θ θ a θ b p(θ a y 1,...,y n ) p(θ a y 1,...,y n ) P = θ yi a (1 θ a ) n P y i p(θ a )/p(y 1,...,y n ) P θ yi (1 θ b b ) n P y i p(θb )/p(y 1,...,y n ) P y P i n θa 1 yi θa p(θ a ) = θ b 1 θ b p(θ b ) This shows that the probability density at relative that at θ depends on y b 1,...,y n only through yi θ a

4 Sufficient statistics From this it is possible to show that P (θ A Y 1 = y 1,...,Y n = y n ) n = P θ A Y i = i=1 n i=1 y i We interpret this as meaning that yi contains all the information about θ available from the data yi we say that is a sufficient statistic for θ and p(y 1,...,y n θ)

5 The Binomial model When Y 1,...,Y n iid we saw that the Bin(1, θ) n sufficient statistic has a Bin(n, θ) distribution i=1 Y i Having observed {Y = y} our task is to obtain the posterior distribution of θ: p(θ y) = p(y θ)p(θ) p(y) = n y θ y (1 θ) n y p(θ) p(y) = c(y)θ y (1 θ) n y p(θ) where c(y) is a function of y and not θ

6 A uniform prior The parameter 0 and 1 θ is some unknown number between Suppose our prior information for θ is such that all subintervals of [0, 1] having the same length also have the same probability P (a θ b) =P (a + c θ b + c), for 0 a<b<b+ c 1 This condition implies a uniform density for θ: p(θ) = 1 for all θ [0, 1]

7 Normalizing constant Using the following result from calculus 1 θ a 1 (1 θ) b 1 dθ = Γ(a)Γ(b) 0 Γ(a + b) where Γ(n) =(n 1)! is evaluated in R as gamma(n) we can find out what 1 c(y) is under the uniform prior p(θ y) dy =1 0 c(y) = Γ(n + 2) Γ(y + 1)Γ(n y + 1)

8 Beta posterior So the posterior distribution is p(θ y) = = To wrap up: Γ(n + 2) Γ(y + 1)Γ(n y + 1) θy (1 θ) n y Γ(n + 2) Γ(y + 1)Γ(n y + 1) θ(y+1) 1 (1 θ) (n y+1) 1 = Beta(y +1,n y + 1) a uniform prior, plus a Bernoilli/Binomial likelihood (sampling model) gives a Beta posterior

9 Example: Happiness Each female of age 65 or over in the 1998 General Social Survey was asked whether or not the were generally happy Let Y i =1if respondent i reported being generally happy, and Y i =0otherwise, for i =1,...,n= 129 individuals Since 129 N, the total size of the female senior citizen population, our joint beliefs about Y 1,...,Y 129 are well approximated by (the sampling model) our beliefs about θ = N i=1 Y i/n the model that, conditional on θ, the Y i s are IID Bernoilli RVs with expectation θ

10 Example: Happiness IID Bernoulli likelihood This sampling model says that the probability for any potential outcome {y 1,...,y n }, conditional on θ, is given by p(y 1,...,y 129 θ) =θ P 129 i=1 y i (1 θ) 129 P 129 i=1 y i The survey revealed that 118 individuals report being generally happy (91%) 11 individuals do not (9%) So the probability of these data for a given value of p(y 1,...,y 129 θ) =θ 118 (1 θ) 11 θ is

11 Example: Happiness Binomial likelihood We could work with this IID Bernoulli likelihood directly. But for a sense of the bigger picture, we ll use that we know that n Y = Y i Bin(n, θ) i=1 In this case, our observed data is y = 118 So (now) the probability of these data for a given value of θ is 129 p(y θ) = 118 θ 118 (1 θ) 11 Notice that this is within a constant factor of the previous Bernoilli likelihood

12 Example: Happiness Posterior If we take a uniform prior for θ, expressing our ignorance, then we know the posterior is p(θ y) = Beta(y +1,n y + 1) For n = 129 and y = 118, this gives θ {Y = 118} Beta(119, 12) density posterior prior theta

13 Uniform is Beta The uniform prior has p(θ) =1for all θ [0, 1] This can be thought of as a Beta distribution with parameters a =1,b=1 p(θ) = Γ(2) Γ(1)Γ(1) θ1 1 (1 θ) 1 1 = (under the convention that Γ(1) = 1) θ Beta(1, 1) We saw that if Y θ Bin(n, θ) then θ {Y = y} Beta(1 + y, 1+n y)

14 General Beta prior Suppose θ Beta(a, b) and Y θ Bin(n, θ) p(θ y) θ a+y 1 (1 θ) b+n y 1 θ y Beta(a + y, b + n y) In other words, the constant of proportionality must be: c(n, y, a, b) = Γ(a + b + n) Γ(a + y)γ(b + n y) How do I know? p(θ y) (and the beta density) must integrate to 1

15 Posterior proportionality We will use this trick over and over again! I.e., if we recognize that the posterior distribution is proportional to a known probability density, then it must be identical to that density But Careful! The constant of proportionality must be constant with respect to θ

16 Conjugacy We have shown that a beta prior distribution and a binomial sampling model lead to a beta posterior distribution To reflect this, we say that the class of beta priors is conjugate for the binomial sampling model Formally, we say that a class P of prior distributions for θ is conjugate for a sampling model p(y θ) if p(θ) P p(θ y) P Conjugate priors make posterior calculations easy, but might not actually represent our prior information

17 Posterior summaries If we are interested in point-summaries of our posterior inference, then the (full) distribution gives us many points to choose from E.g., if θ {Y = y} Beta(a + y, b + n y) then E{θ y} = a + y a + b + n mode(θ y) = a + y 1 a + b + n 2 Var[θ y] = E{θ y}e{1 θ y} a + b + n +1 not including quantiles, etc.

18 Combining information The posterior expectation E{θ y} is easily recognized as a combination of prior and data information: E{θ y} = a + b a + b + n prior expectation + n a + b + n data average I.e., for this model and prior distribution, the posterior mean is a weighted average of the prior mean and the sample average with weights proportional to a + b and n respectively

19 Prior sensitivity This leads to the interpretation of a and b as prior data : a prior number of 1 s b prior number of 0 s a + b prior sample size If n a + b, then it seems reasonable that most of our information should be coming from the data as opposed to the prior Indeed: a + b a + b + n 0 E{θ y} y n & Var[θ y] 1 n y n 1 y n

20 Prediction An important feature of Bayesian inference is the existence of a predictive distribution for new observations Revert, for the moment, to our notation for Bernoilli data. Let y 1,...,y n, be the outcomes of a sample of n binary RVs Let Ỹ {0, 1} be an additional outcome from the same population that has yet to be observed The predictive distribution of Ỹ is the conditional distribution of Ỹ given {Y 1 = y 1,...,Y n = y n }

21 Predictive distribution For conditionally IID binary variables the predictive distribution can be derived from the distribution of Ỹ given θ and the posterior distribution of θ p(ỹ =1 y 1,...,y n )=E{θ y 1,...,y n } = a + N i=1 y i a + b + n p(ỹ =0 y 1,...,y n )=1 p(ỹ =1 y 1,...,y n ) b + n N i=1 y i a + b + n

22 Two points about prediction 1. The predictive distribution does not depend upon any unknown quantities if it did, we would not be able to use it to make predictions 2. The predictive distribution depends on our observed data i.e. Ỹ is not independent of Y 1,...,Y n this is because observing Y 1,...,Y n gives information about θỹ, which in turn gives information about

23 Example: Posterior predictive Consider a uniform prior, or Y = n i=1 y i P (Ỹ Beta(1, 1), using =1 Y = y) =E{θ Y = y} = 2 2+n n 2+n y n but mode(θ Y = y) = y n Does this discrepancy between these two posterior summaries of our information make sense? Consider the case in which Y =0, for which mode(θ Y = 0) = 0 but P (Ỹ =1 Y = 0) = 1/(2 + n)

24 Confidence regions An interval [l(y),u(y)], based on the observed data Y = y, has 95% Bayesian coverage for θ if P (l(y) < θ <u(y) Y = y) =0.95 The interpretation of this interval is that it describes your information about the true value of θ after you have observed Y = y Such intervals are typically called credible intervals, to distinguish them from frequentist confidence intervals which is an interval that describes a region (based upon ˆθ ) wherein the true θ 0 lies 95% of the time Both, confusingly, use the acronym CI

25 Quantile-based (Bayesian) CI Perhaps the easiest way to obtain a credible interval is to use the posterior quantiles To make a 100 (1 α) % quantile-based CI, find numbers θ α/2 < θ 1 α/2 such that (1) (2) P (θ < θ α/2 Y = y) =α/2 P (θ > θ 1 α/2 Y = y) =α/2 The numbers θ α/2, θ 1 α/2 are the α/2 and 1 α/2 posterior quantiles of θ

26 Example: Binomial sampling and uniform prior Suppose out of n = 10 conditionally independent draws of a binary random variable we observe Y =2 ones Using a uniform prior distribution for θ, the posterior distribution is θ {Y =2} Beta(1 + 2, 1 + 8) A 95% CI can be obtained from the and quantiles of this beta distribution These quantiles are 0.06 and 0.52, respectively, and so the posterior probability that is 95% θ [0.06, 0.52]

27 Example: a quantile-based CI drawback Notice that there are θ-values outside the quantilebased CI that have higher probability [density] than some points inside the interval density posterior 95% CI theta

28 HPD region The quantile-based drawback suggests a more restrictive type of interval may be warranted A 100 (1 α) % high posterior density (HPD) region consists of a subset of the parameter space such that 1. P (θ s(y) Y = y) =1 α 2. If θ a s(y), and θ b / s(y), then p(θ a Y = y) >p(θ b Y = y) All points in an HPD region have a higher posterior density than points outside the region

29 Constructing a HPD region However, a HPD region might not be an interval if the posterior density is multimodal When it is an interval, the basic idea behind its construction is as follows: 1. Gradually move a horizontal line down across the density, including in the HPD region all of the θ-values having a density above the horizontal line 2. Stop when the posterior probability of the θ-values in the region reaches 1 α

30 Example: HPD region Numerically, we may collect the highest density points whose cumulative density is greater than 1 α density % q CI 50% HPD 75% HPD 95% HPD theta The 95% HPD region is [0.04, 0.048] is narrower than the quantile-based CI, yet both contain 95% probability

31 Example: Estimating the probability of a female birth The proportion of births that are female has long been a topic of interest both scientifically and to the lay public E.g., Laplace, perhaps the first Bayesian, analysed Parisian birth data from 1745 with a uniform prior and a binomial sampling model 241,945 girls, and 251,527 boys

32 Example: female birth posterior The posterior distribution for θ is (n = ) θ {Y = } Beta( , ) Now, we can use the posterior CDF to calculate P (θ > 0.5 Y = ) = pbeta(0.5, , , lower.tail=false) =

33 Example: female birth with placenta previa Placenta previa is an unusual condition of pregnancy in which the placenta is implanted very low in the uterus, obstructing the fetus from a normal vaginal delivery An early study concerning the sex of placenta previa births in Germany found that of a total of 980 births, 437 were female How much evidence does this provide for the claim that the proportion of the female births in the population of placenta previa births is less than 0.485, the proportion of female births in the general population?

34 Example: female birth with placenta previa The posterior distribution is θ {Y = 437} Beta( , ) The summary statistics are E{θ {Y = 437}} =0.446 Var[θ {Y = 437}] =0.016 med(θ {Y = 437}) =0.446 (sd) The central 95% CI is [0.415, 0.476] P (θ > Y = 437) = 0.007

35 The Poisson model Some measurements, such as a person s number of children or number of friends, have values that are whole numbers In these cases the sample space is Y = {0, 1, 2,...} Perhaps the simplest probability model on Poisson model Y is the A RV Y has a Poisson distribution with mean θ if P (Y = y θ) =θ y e θ y! for y {0, 1, 2,...}

36 Mean variance relationship Poisson RVs have the property that E{Y θ} = Var[Y θ] =θ People sometimes say that the Poisson family of distributions has a mean variance relationship because if one Poisson distribution has a larger mean than another, it will have a larger variance as well

37 IID Sampling model If we take is: Y 1,...,Y n iid Pois(θ) then the joint PDF n P (Y 1 = y 1,...,Y n = y n θ) = p(y i θ) = i=1 n i=1 1 y! θy i e nθ = c(y 1,...,y n )θ P y i e nθ

38 Posterior odds ratio Comparing two values of θ a posteriori, we have P p(θ a y 1,...,y n ) p(θ b y 1,...,y n ) = c(y 1,...,y n )e nθa θ yi a p(θ a ) P c(y 1,...,y n )e nθ b θ yi p(θ b b ) = e nθ a e nθ b θa θ b P y i p(θ a) p(θ b ) n As in the case of the IID Bernoulli model, i=1 Y i, is a sufficient statistic Furthermore n i=1 Y i Pois(nθ)

39 Conjugate prior Recall that a class of priors is conjugate for a sampling model if the posterior is also in that class For the Poisson model, our posterior distribution for θ has the following form p(θ y 1,...,y m ) p(y 1,...,y n θ)p(θ) θ P y i e nθ p(θ) This means that whatever our conjugate class of densities is, it will have to include terms like θ c 1 e c 2θ for numbers c 1 and c 2

40 Conjugate gamma prior The simplest class of probability distributions matching this description is the gamma family p(θ) = ba Γ(a) θa 1 e bθ for θ,a,b>0 Some properties include E{θ} = a b Var[θ] = a b 2 mode(θ) = (a 1)/b if a>1 0 if a 1 We shall denote this family as G(a, b)

41 Posterior distribution Suppose Y 1,...Y n θ iid Pois(θ) and θ G(a, b) Then p(θ y 1,...,y n ) θ a+p y i 1 e (b+n)θ This is evidently a gamma distribution, and we have confirmed the conjugacy of the gamma family for the Poisson sampling model θ G(a, b) Y 1,...,Y n Pois(θ) n {θ Y 1,...,Y n } G a + Y i,b+ n i=1

42 Combining information Estimation and prediction proceed in a manner similar to that in the binomial model The posterior expectation of θ is a convex combination of the prior expectation and the sample average b a n yi E{θ y 1,...,y n } = b + n b + b + n n b is the number of prior observations a is the sum of the counts from b prior observations For large n, the information in the data dominates n b E{θ y 1,...y n } ȳ, Var[θ y 1,...,y n ] ȳ n

43 Posterior predictive Predictions about additional data can be obtained with the posterior predictive distribution p(ỹ y 1,...,y n ) (b + n) a+p y i = Γ(ỹ + 1)Γ(a + y i ) (b + n) a+p y i = Γ(ỹ + 1)Γ(a + y i ) 0 θ a+p y i +ỹ 1 e (b n)θ dθ P a+ b + n yi 1 b + n +1 b + n +1 ȳ for ỹ {0, 1, 2,...,}

44 Posterior predictive This is a negative binomial distribution with parameters (a + y i,b+ n) for which E{Ỹ y 1,...,y n } = a + y i b + n = E{θ y 1,...,y n } Var[Ỹ y 1,...,y n ]= a + y i b + n b + n +1 b + n = Var[θ y 1,...,y n ] (b + n + 1) = E{θ y 1,...,y n } b + n +1 b + n

45 Example: birth rates and education Over the course of the 1990s the General Soceital Survey gathered data on the educational attainment and number of children of 155 women who were 40 years of age at the time of their participation in the survey These women were in their 20s in the 1970s, a period of historically low fertility rates in the United States In this example we will compare the women with college degrees to those without in terms of their numbers of children

46 Example: sampling model(s) Let Y 1,1,...,Y n1,1 denote the numbers of children for the n 1 women without college degrees and n 2 Y 1,2,...,Y n2,2 degrees be the data for the women with For this example, we will use the following sampling models Y 1,1,...,Y n1,1 Y 1,2,...,Y n2,2 iid Pois(θ1 ) iid Pois(θ2 ) The (sufficient) data are no bachelors: bachelors: n 1 = 111, n 1 i=1 y i,1 = 217, ȳ 1 =1.95 n 2 = 44, n 2 i=1 y i,2 = 66, ȳ 2 =1.50

47 Example: prior(s) and posterior(s) In the case were {θ 1, θ 2 } iid G(a =1,b= 2) we have the following posterior distributions: θ 1 {n 1 = 111, Y i,1 = 217} G( , ) θ 2 {n 1 = 44, Y i,2 = 66} G(2 + 66, ) Posterior means, modes and 95% quantile-based CIs for θ 1 and θ 2 can be obtained from their gamma posterior distributions

48 Example: prior(s) and posterior(s) density prior post none post bach theta The posterior(s) provide substantial evidence that θ 1 > θ 2. E.g., P (θ 1 > θ 2 Y i,1 = 217, Y i,2 = 66) = 0.97

49 Example: posterior predictive(s) Consider a randomly sampled individual from each population. To what extent do we expect the uneducated one to have more children than the other? probability none bach P (Ỹ1 > Ỹ2 Y i,1 = 217, Y i,2 = 66) = 0.48 P (Ỹ1 = Ỹ2 Y i,1 = 217, Y i,2 = 66) = children The distinction between {θ 1 > θ 2 } and {Ỹ1 > Ỹ2} is important: strong evidence of a difference between two populations does not mean the distance is large

50 Non-informative priors Priors can be difficult to construct There has long been a desire for prior distributions that can be guaranteed to play a minimal role in the posterior distribution Such distributions are sometimes called reference priors, and described as vague, flat, diffuse, or non-informative The rationale for using non-informative priors is often said to be to let the data speak for themselves, so that inferences are unaffected by the information external to the current data

51 Jeffreys invariance principle One approach that is sometimes used to define a noninformative prior distribution was introduced by Jeffreys (1946), based on considering one-to-one transformations of the parameter Jeffrey s general principle is that any rule for determining the prior density should yield an equivalent posterior if applied to the transformed parameter Naturally, one must take the sampling model into account in order to study this invariance

52 The Jeffreys prior Jeffreys principle leads to defining the non-informative prior as p(θ) [i(θ)] 1/2, where i(θ) is the Fisher information for θ Recall that i(θ) =E θ d dθ log p(y θ) 2 = E θ d 2 log p(y θ) dθ2 A prior so-constructed is called the Jeffreys prior for the sampling model p(y θ)

53 Example: Jeffreys prior for the binomial sampling model Consider the binomial distribution Y Bin(n, θ) which has log-likelihood log p(y θ) =c + y log θ +(n y) log(1 θ) Therefore i(θ) = E θ d 2 log p(y θ) dθ2 = n θ(1 θ) So the Jeffreys prior density is then p(θ) θ 1/2 (1 θ) 1/2 which is the same as θ Beta( 1 2, 1 2 )

54 Example: non-informative? This θ Beta( 1 2, 1 2 ) Jeffreys prior is non-informative in some sense, but not in all senses The posterior expectation would give sample(s) worth of weight to the prior mean This Jeffreys prior less informative than the uniform prior which gives a weight of 2 samples to the prior mean (also 1) a + b =1 a a+b =1 In this sense, the least informative prior is θ Beta(0, 0) But is this a valid prior distribution?

55 Propriety of priors We call a prior density p(θ) proper if it does not depend on data and is integrable, i.e., Θ p(θ) dθ < Proper priors lead to valid joint probability models p(θ,y) and proper posteriors ( p(θ y) dθ < ) Θ Improper priors do not provide valid joint probability models, but they can lead to proper posterior by proceeding with the Bayesian algebra p(θ y) p(y θ)p(θ)

56 Example: continued The θ Beta(0, 0) 1 prior is improper since 0 θ 1 (1 θ) 1 dθ = However, the Beta(0 + y, 0+n y) posterior that results is proper as long as y = 0,n A θ Beta(, ) prior, for 0 < 1 is always proper, and (in the limit) non-informative In practice, the difference between these alternatives (Beta(0, 0), Beta( 1 2, 1 2 ), Beta(1, 1)) is small; all three allow the data (likelihood) to dominate in the posterior

57 Example: Jeffreys prior for the Poisson sampling model Consider the Poisson distribution has log-likelihood Y Pois(θ) which log p(y θ) =c + y log θ θ Therefore i(θ) = E θ d 2 log p(y θ) dθ2 = 1 θ So the Jeffreys prior density is then p(θ) θ 1/2

58 Example: Improper Jeffreys prior Since 0 θ 1/2 dθ = this prior is improper However, we can interpret is as a G( 1, i.e., within 2, 0) the conjugate gamma family, to see that the posterior is G 1 n 2 + i=1 y i, 0+n This is proper as long as we have observed one data point, i.e., n 1, since 0 θ α e β dθ < for all α > 0, β 1

59 Example: non-informative? Again, this G( 1 2, 0) Jeffreys prior is non-informative in some sense, but not in all senses The (improper) prior p(θ) θ 1, or θ G(0, 0), is in some sense equivalent since it gives the same weight (b = 0) to the prior mean (although different) It can be shown that, in the limit as the prior parameters (a, b) (0, 0) the posterior expectation approaches the MLE p(θ) θ 1 is a typical default for this reason As before, the practical difference between these choices is small

60 Notes of caution The search for non-informative priors has several problems, including it can be misguided: if the likelihood (data) is truly dominant, the the choice among relatively flat priors cannot matter for many problems there is no clear choice difficulties arise when averaging over competing models (Lindley s paradox) Bottom line: If so few data are available that the choice of non-informative prior matters, then one should put relevant information into the prior

61 Example: Heart transplant mortality rate For a particular hospital, we observe n : the number of transplant surgeries y : the number of deaths within 30 days of surgery In addition, one can predict the probability of death for each patient based upon a model that uses information such as the patients medical condition before the surgery, gender, and race. From this, one can obtain e : the expected number of deaths (exposure) A standard model assumes that the number of deaths follows a Poisson distribution with mean eλ, and the objective is to estimate the mortality rate per unit exposure λ

62 Example: problems with the MLE The standard estimate of λ is the MLE ˆλ = y/e. Unfortunately, this estimate can be poor when the number of deaths y is close to zero In this situation, when small death counts are possible, it is desirable to use a Bayesian estimate that uses prior knowledge about the size of the mortality rate

63 Example: prior information A convenient source of prior information is heart transplant data from a small group of hospitals that we believe has the same rate of mortality as the rate from the hospital of interest Suppose we observe z j : the number of deaths o j : the exposure where Z j iid Pois(oj λ) for ten hospitals (j =1,...,10) If we assign λ the non-informative prior then the posterior is p(λ) λ 1 G( 10 j=1 z i, n j=1 o i) (check this)

64 Example: the prior from data Suppose that, in this example, we have the prior data 10 z i = 16, n o i = j=1 j=1 So for our prior (to analyse the data for our hospital) we shall use λ G(16, 15174) Now, we consider inference about the heart transplant death rate for two hospitals A. small exposure (e a = 66) and y a =1death B. larger exposure (e b = 1767) and y b =4deaths

65 Example: posterior The posterior distributions are as follows A. B. λ a y a,e a G(16 + 1, ) λ b y b,e b G(16 + 4, ) density prior post A post B lambda P (λ a < λ b y a,e a,y b,e b )=0.57

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

### 1 Prior Probability and Posterior Probability

Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Maximum Likelihood Estimation

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

### Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

### Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8

### Principle of Data Reduction

Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

### Notes on the Negative Binomial Distribution

Notes on the Negative Binomial Distribution John D. Cook October 28, 2009 Abstract These notes give several properties of the negative binomial distribution. 1. Parameterizations 2. The connection between

### MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

### Bayesian Analysis for the Social Sciences

Bayesian Analysis for the Social Sciences Simon Jackman Stanford University http://jackman.stanford.edu/bass November 9, 2012 Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November

### Basic Bayesian Methods

6 Basic Bayesian Methods Mark E. Glickman and David A. van Dyk Summary In this chapter, we introduce the basics of Bayesian data analysis. The key ingredients to a Bayesian analysis are the likelihood

### For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (1903-1987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll

### Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections 9.1 Problems of Testing Hypotheses Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.5 The t Test 9.6

α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =

### Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

### 15.0 More Hypothesis Testing

15.0 More Hypothesis Testing 1 Answer Questions Type I and Type II Error Power Calculation Bayesian Hypothesis Testing 15.1 Type I and Type II Error In the philosophy of hypothesis testing, the null hypothesis

### MAS2317/3317. Introduction to Bayesian Statistics. More revision material

MAS2317/3317 Introduction to Bayesian Statistics More revision material Dr. Lee Fawcett, 2014 2015 1 Section A style questions 1. Describe briefly the frequency, classical and Bayesian interpretations

### Bayes and Naïve Bayes. cs534-machine Learning

Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

### Statistics 100A Homework 8 Solutions

Part : Chapter 7 Statistics A Homework 8 Solutions Ryan Rosario. A player throws a fair die and simultaneously flips a fair coin. If the coin lands heads, then she wins twice, and if tails, the one-half

### Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### Bayesian probability: P. State of the World: X. P(X your information I)

Bayesian probability: P State of the World: X P(X your information I) 1 First example: bag of balls Every probability is conditional to your background knowledge I : P(A I) What is the (your) probability

### The Dirichlet-Multinomial and Dirichlet-Categorical models for Bayesian inference

The Dirichlet-Multinomial and Dirichlet-Categorical models for Bayesian inference Stephen Tu tu.stephenl@gmail.com 1 Introduction This document collects in one place various results for both the Dirichlet-multinomial

### Summary of Probability

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

### The Exponential Family

The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

### UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS 1. Given the probability density function of a continuous random variable X as follows f(x) = 6x (1-x) 0

### Exam C, Fall 2006 PRELIMINARY ANSWER KEY

Exam C, Fall 2006 PRELIMINARY ANSWER KEY Question # Answer Question # Answer 1 E 19 B 2 D 20 D 3 B 21 A 4 C 22 A 5 A 23 E 6 D 24 E 7 B 25 D 8 C 26 A 9 E 27 C 10 D 28 C 11 E 29 C 12 B 30 B 13 C 31 C 14

### Lab 8: Introduction to WinBUGS

40.656 Lab 8 008 Lab 8: Introduction to WinBUGS Goals:. Introduce the concepts of Bayesian data analysis.. Learn the basic syntax of WinBUGS. 3. Learn the basics of using WinBUGS in a simple example. Next

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Homework 4 - KEY. Jeff Brenion. June 16, 2004. Note: Many problems can be solved in more than one way; we present only a single solution here.

Homework 4 - KEY Jeff Brenion June 16, 2004 Note: Many problems can be solved in more than one way; we present only a single solution here. 1 Problem 2-1 Since there can be anywhere from 0 to 4 aces, the

### Aggregate Loss Models

Aggregate Loss Models Chapter 9 Stat 477 - Loss Models Chapter 9 (Stat 477) Aggregate Loss Models Brian Hartman - BYU 1 / 22 Objectives Objectives Individual risk model Collective risk model Computing

### Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior

### Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...

Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................

### Chapter 4 Statistical Inference in Quality Control and Improvement. Statistical Quality Control (D. C. Montgomery)

Chapter 4 Statistical Inference in Quality Control and Improvement 許 湘 伶 Statistical Quality Control (D. C. Montgomery) Sampling distribution I a random sample of size n: if it is selected so that the

### Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

### THE MULTINOMIAL DISTRIBUTION. Throwing Dice and the Multinomial Distribution

THE MULTINOMIAL DISTRIBUTION Discrete distribution -- The Outcomes Are Discrete. A generalization of the binomial distribution from only 2 outcomes to k outcomes. Typical Multinomial Outcomes: red A area1

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### Hypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 One-Sided and Two-Sided Tests

Hypothesis Testing 1 Introduction This document is a simple tutorial on hypothesis testing. It presents the basic concepts and definitions as well as some frequently asked questions associated with hypothesis

### The Distribution of Aggregate Life Insurance Claims

The Distribution of ggregate Life Insurance laims Tom Edwalds, FS Munich merican Reassurance o. BSTRT This paper demonstrates the calculation of the moments of the distribution of aggregate life insurance

### Dongfeng Li. Autumn 2010

Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

### Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

### Time Series Analysis

Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:

### Sufficient Statistics and Exponential Family. 1 Statistics and Sufficient Statistics. Math 541: Statistical Theory II. Lecturer: Songfeng Zheng

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Sufficient Statistics and Exponential Family 1 Statistics and Sufficient Statistics Suppose we have a random sample X 1,, X n taken from a distribution

### Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

### Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

### CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Some Continuous Probability Distributions CHAPTER 6: Continuous Uniform Distribution: 6. Definition: The density function of the continuous random variable X on the interval [A, B] is B A A x B f(x; A,

### Joint Exam 1/P Sample Exam 1

Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

### Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important

### A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

### Test Volume 14, Number 2. December 2005

Sociedad de Estadística e Investigación Operativa Test Volume 14, Number 2. December 2005 Intrinsic Credible Regions: An Objective Bayesian Approach to Interval Estimation José M. Bernardo Departamento

### Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012

Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic

### Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

### Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

### University of California, Berkeley, Statistics 134: Concepts of Probability

University of California, Berkeley, Statistics 134: Concepts of Probability Michael Lugo, Spring 211 Exam 2 solutions 1. A fair twenty-sided die has its faces labeled 1, 2, 3,..., 2. The die is rolled

### 38. STATISTICS. 38. Statistics 1. 38.1. Fundamental concepts

38. Statistics 1 38. STATISTICS Revised September 2013 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in high-energy physics. In statistics, we are interested in using a

### 0 x = 0.30 x = 1.10 x = 3.05 x = 4.15 x = 6 0.4 x = 12. f(x) =

. A mail-order computer business has si telephone lines. Let X denote the number of lines in use at a specified time. Suppose the pmf of X is as given in the accompanying table. 0 2 3 4 5 6 p(.0.5.20.25.20.06.04

### 1 Sufficient statistics

1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

### Statistics 104: Section 6!

Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC

### Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

### CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

### Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

### Worked examples Multiple Random Variables

Worked eamples Multiple Random Variables Eample Let X and Y be random variables that take on values from the set,, } (a) Find a joint probability mass assignment for which X and Y are independent, and

### STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random

### Practice problems for Homework 11 - Point Estimation

Practice problems for Homework 11 - Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:

### 6. Jointly Distributed Random Variables

6. Jointly Distributed Random Variables We are often interested in the relationship between two or more random variables. Example: A randomly chosen person may be a smoker and/or may get cancer. Definition.

### SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS Copyright 005 by the Society of Actuaries and the Casualty Actuarial Society

### The Two Envelopes Problem

1 The Two Envelopes Problem Rich Turner and Tom Quilter The Two Envelopes Problem, like its better known cousin, the Monty Hall problem, is seemingly paradoxical if you are not careful with your analysis.

### A Predictive Probability Design Software for Phase II Cancer Clinical Trials Version 1.0.0

A Predictive Probability Design Software for Phase II Cancer Clinical Trials Version 1.0.0 Nan Chen Diane Liu J. Jack Lee University of Texas M.D. Anderson Cancer Center March 23 2010 1. Calculation Method

### Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

Lecture 4 Nancy Pfenning Stats 000 Chapter 7: Probability Last time we established some basic definitions and rules of probability: Rule : P (A C ) = P (A). Rule 2: In general, the probability of one event

### Chapter 2: Data quantifiers: sample mean, sample variance, sample standard deviation Quartiles, percentiles, median, interquartile range Dot diagrams

Review for Final Chapter 2: Data quantifiers: sample mean, sample variance, sample standard deviation Quartiles, percentiles, median, interquartile range Dot diagrams Histogram Boxplots Chapter 3: Set

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing

Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance

### An introduction to the Beta-Binomial model

An introduction to the Beta-Binomial model COMPSCI 316: Computational Cognitive Science Dan Navarro & Amy Perfors University of Adelaide Abstract This note provides a detailed discussion of the beta-binomial

### Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Monte Carlo Simulation: IEOR E4703 Fall 2004 c 2004 by Martin Haugh Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab 1 Overview of Monte Carlo Simulation 1.1 Why use simulation?

### STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

### Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### Review of Random Variables

Chapter 1 Review of Random Variables Updated: January 16, 2015 This chapter reviews basic probability concepts that are necessary for the modeling and statistical analysis of financial data. 1.1 Random

### 7.1 The Hazard and Survival Functions

Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence

### Master s Theory Exam Spring 2006

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

### The Exponential Distribution

21 The Exponential Distribution From Discrete-Time to Continuous-Time: In Chapter 6 of the text we will be considering Markov processes in continuous time. In a sense, we already have a very good understanding

### Lecture notes for Choice Under Uncertainty

Lecture notes for Choice Under Uncertainty 1. Introduction In this lecture we examine the theory of decision-making under uncertainty and its application to the demand for insurance. The undergraduate

### 2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

### , for x = 0, 1, 2, 3,... (4.1) (1 + 1/n) n = 2.71828... b x /x! = e b, x=0

Chapter 4 The Poisson Distribution 4.1 The Fish Distribution? The Poisson distribution is named after Simeon-Denis Poisson (1781 1840). In addition, poisson is French for fish. In this chapter we will

### What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures

### 7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

### Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:

### The Delta Method and Applications

Chapter 5 The Delta Method and Applications 5.1 Linear approximations of functions In the simplest form of the central limit theorem, Theorem 4.18, we consider a sequence X 1, X,... of independent and

### Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely

### Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

### STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE

STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE TROY BUTLER 1. Random variables and distributions We are often presented with descriptions of problems involving some level of uncertainty about

### Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on

### 1.1 Introduction, and Review of Probability Theory... 3. 1.1.1 Random Variable, Range, Types of Random Variables... 3. 1.1.2 CDF, PDF, Quantiles...

MATH4427 Notebook 1 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 1 MATH4427 Notebook 1 3 1.1 Introduction, and Review of Probability