# Maximum Likelihood Estimation

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for an unknown parameter θ. It was introduced by R. A. Fisher, a great English mathematical statistician, in Maximum likelihood estimation (MLE) can be applied in most problems, it has a strong intuitive appeal, and often yields a reasonable estimator of θ. Furthermore, if the sample is large, the method will yield an excellent estimator of θ. For these reasons, the method of maximum likelihood is probably the most widely used method of estimation in statistics. Suppose that the random variables X 1,, X n form a random sample from a distribution f(x θ); if X is continuous random variable, f(x θ) is pdf, if X is discrete random variable, f(x θ) is point mass function. We use the given symbol to represent that the distribution also depends on a parameter θ, where θ could be a real-valued unknown parameter or a vector of parameters. For every observed random sample x 1,, x n, we define f(x 1,, x n θ) = f(x 1 θ) f(x n θ) (1) If f(x θ) is pdf, f(x 1,, x n θ) is the joint density function; if f(x θ) is pmf, f(x 1,, x n θ) is the joint probability. Now we call f(x 1,, x n θ) as the likelihood function. As we can see, the likelihood function depends on the unknown parameter θ, and it is always denoted as L(θ). Suppose, for the moment, that the observed random sample x 1,, x n came from a discrete distribution. If an estimate of θ must be selected, we would certainly not consider any value of θ for which it would have been impossible to obtain the data x 1,, x n that was actually observed. Furthermore, suppose that the probability f(x 1,, x n θ) of obtaining the actual observed data x 1,, x n is very high when θ has a particular value, say θ = θ 0, and is very small for every other value of θ. Then we would naturally estimate the value of θ to be θ 0. When the sample comes from a continuous distribution, it would again be natural to try to find a value of θ for which the probability density f(x 1,, x n θ) is large, and to use this value as an estimate of θ. For any given observed data x 1,, x n, we are led by this reasoning to consider a value of θ for which the likelihood function L(θ) is a maximum and to use this value as an estimate of θ. 1

2 2 The meaning of maximum likelihood is as follows. We choose the parameter that makes the likelihood of having the obtained data at hand maximum. With discrete distributions, the likelihood is the same as the probability. We choose the parameter for the density that maximizes the probability of the data coming from it. Theoretically, if we had no actual data, maximizing the likelihood function will give us a function of n random variables X 1,, X n, which we shall call maximum likelihood estimate ˆθ. When there are actual data, the estimate takes a particular numerical value, which will be the maximum likelihood estimator. MLE requires us to maximum the likelihood function L(θ) with respect to the unknown parameter θ. From Eqn. 1, L(θ) is defined as a product of n terms, which is not easy to be maximized. Maximizing L(θ) is equivalent to maximizing log L(θ) because log is a monotonic increasing function. We define log L(θ) as log likelihood function, we denote it as l(θ), i.e. n l(θ) = log L(θ) = log f(x i θ) = log f(x i θ) Maximizing l(θ) with respect to θ will give us the MLE estimation. 2 Examples Example 1: Suppose that X is a discrete random variable with the following probability mass function: where 0 θ 1 is a parameter. The following 10 independent observations X P (X) 2θ/3 θ/3 2(1 θ)/3 (1 θ)/3 were taken from such a distribution: (3,0,2,1,3,2,1,0,2,1). What is the maximum likelihood estimate of θ. Solution: Since the sample is (3,0,2,1,3,2,1,0,2,1), the likelihood is L(θ) = P (X = 3)P (X = 0)P (X = 2)P (X = 1)P (X = 3) P (X = 2)P (X = 1)P (X = 0)P (X = 2)P (X = 1) (2) Substituting from the probability distribution given above, we have n L(θ) = P (X i θ) = ( 2θ 3 ) 2 ( θ 3 ) 3 ( ) 3 ( 2(1 θ) 1 θ 3 3 Clearly, the likelihood function L(θ) is not easy to maximize. ) 2

3 3 Let us look at the log likelihood function l(θ) = log L(θ) = log P (X i θ) = 2 (log 2 ) 3 + log θ + 3 (log 1 ) 3 + log θ + 3 (log 2 ) 3 + log(1 θ) + 2 (log 1 ) 3 + log(1 θ) = C + 5 log θ + 5 log(1 θ) where C is a constant which does not depend on θ. It can be seen that the log likelihood function is easier to maximize compared to the likelihood function. Let the derivative of l(θ) with respect to θ be zero: dl(θ) dθ = 5 θ 5 1 θ = 0 and the solution gives us the MLE, which is ˆθ = 0.5. We remember that the method of moment estimation is ˆθ = 5/12, which is different from MLE. Example 2: Suppose X 1, X 2,, X n are i.i.d. random variables with density function f(x σ) = 1 exp ( x ) 2σ σ, please find the maximum likelihood estimate of σ. Solution: The log-likelihood function is Let the derivative with respect to θ be zero: [ l(σ) = log 2 log σ X ] i σ [ l (σ) = 1 σ + X ] i σ 2 and this gives us the MLE for σ as ˆσ = = n σ + n X i σ 2 = 0 n X i Again this is different from the method of moment estimation which is n ˆσ = n X 2 i 2n Example 3: Use the method of moment to estimate the parameters µ and σ for the normal density f(x µ, σ 2 ) = 1 } (x µ)2 exp, 2πσ 2σ 2

4 4 based on a random sample X 1,, X n. Solution: In this example, we have two unknown parameters, µ and σ, therefore the parameter θ = (µ, σ) is a vector. We first write out the log likelihood function as l(µ, σ) = [ log σ 1 ] 1 log 2π 2 2σ (X 2 i µ) 2 = n log σ n 1 log 2π 2 2σ 2 Setting the partial derivative to be 0, we have (X i µ) 2 l(µ, σ) µ = 1 n (X σ 2 i µ) = 0 l(µ, σ) σ = n σ + n σ 3 (X i µ) 2 = 0 Solving these equations will give us the MLE for µ and σ: ˆµ = X and ˆσ = 1 n (X i X) 2 This time the MLE is the same as the result of method of moment. From these examples, we can see that the maximum likelihood result may or may not be the same as the result of method of moment. Example 4: The Pareto distribution has been used in economics as a model for a density function with a slowly decaying tail: f(x x 0, θ) = θx θ 0x θ 1, x x 0, θ > 1 Assume that x 0 > 0 is given and that X 1, X 2,, X n is an i.i.d. sample. Find the MLE of θ. Solution: The log-likelihood function is l(θ) = log f(x i θ) = (log θ + θ log x 0 (θ + 1) log X i ) = n log θ + nθ log x 0 (θ + 1) log X i Let the derivative with respect to θ be zero: dl(θ) dθ = n θ + n log x 0 log X i = 0

5 5 Solving the equation yields the MLE of θ: ˆθ MLE = 1 log X log x 0 Example 5: Suppose that X 1,, X n form a random sample from a uniform distribution on the interval (0, θ), where of the parameter θ > 0 but is unknown. Please find MLE of θ. Solution: The pdf of each observation has the following form: 1/θ, for 0 x θ (3) Therefore, the likelihood function has the form 1, for 0 x L(θ) = θ n i θ (i = 1,, n) It can be seen that the MLE of θ must be a value of θ for which θ x i for i = 1,, n and which maximizes 1/θ n among all such values. Since 1/θ n is a decreasing function of θ, the estimate will be the smallest possible value of θ such that θ x i for i = 1,, n. This value is θ = max(x 1,, x n ), it follows that the MLE of θ is ˆθ = max(x 1,, X n ). It should be remarked that in this example, the MLE ˆθ does not seem to be a suitable estimator of θ. We know that max(x 1,, X n ) < θ with probability 1, and therefore ˆθ surely underestimates the value of θ. Example 6: Suppose again that X 1,, X n form a random sample from a uniform distribution on the interval (0, θ), where of the parameter θ > 0 but is unknown. However, suppose now we write the density function as 1/θ, for 0 < x < θ (4) We will prove that in this case, the MLE for θ does not exist. Proof: The only difference between Eqn. 3 and Eqn. 4 is that the value of the pdf at the two endpints 0 and θ has been changed by replacing the weak inequalities in Eqn. 3 with strict inequalities in Eqn. 4. Either equation could be used as the pdf of the uniform distribution. However, if Eqn. 4 is used as the pdf, then an MLE of θ will be a value of θ for which θ > x i for i = 1,, n and which maximizes 1/θ n among all such values. It should be noted that the possible values of θ no longer include the value θ = max(x 1,, x n ), since θ must be strictly greater than each observed value x i for i = 1,, n. Since θ can be chosen arbitrarily close to the value max(x 1,, x n ) but cannot be chosen equal to this value, it follows that the MLE of θ does not exist in this case.

6 6 Example 5 and 6 illustrate one shortcoming of the concept of an MLE. We know that it is irrelevant whether the pdf of the uniform distribution is chosen to be equal to 1/θ over the open interval 0 < x < θ or over the closed interval 0 x θ. Now, however, we see that the existence of an MLE depends on this typically irrelevant and unimportant choice. This difficulty is easily avoided in Example 5 by using the pdf given by Eqn. 3 rather than that given by Eqn. 4. In many other problems, as well, in which there is a difficulty of this type in regard to the existence of an MLE, the difficulty can be avoided simply by choosing one particular appropriate version of the pdf to represent the given distribution. Example 7: Suppose that X 1,, X n form a random sample from a uniform distribution on the interval (θ, θ + 1), where the value of the parameter θ is unknown ( < θ < ). Clearly, the density function is 1, for θ x θ + 1 We will see that the MLE for θ is not unique. Proof: In this example, the likelihood function is 1, for θ xi θ + 1 (i = 1,, n) L(θ) = The condition that θ x i for i = 1,, n is equivalent to the condition that θ min(x 1,, x n ). Similarly, the condition that x i θ + 1 for i = 1,, n is equivalent to the condition that θ max(x 1,, x n ) 1. Therefore, we can rewrite the the likelihood function as 1, for max(x1,, x L(θ) = n ) 1 θ min(x 1,, x n ) Thus, we can select any value in the interval [max(x 1,, x n ) 1, min(x 1,, x n )] as the MLE for θ. Therefore, the MLE is not uniquely specified in this example. 3 Exercises Exercise 1: Let X 1,, X n be an i.i.d. sample from a Poisson distribution with parameter λ, i.e., P (X = x λ) = λx e λ. x! Please find the MLE of the parameter λ. Exercise 2: Let X 1,, X n be an i.i.d. sample from an exponential distribution with the density function f(x β) = 1 β e x β, with 0 x <.

7 7 Please find the MLE of the parameter β. Exercise 3: Gamma distribution has a density function as f(x α, λ) = 1 Γ(α) λα x α 1 e λx, with 0 x <. Suppose the parameter α is known, please find the MLE of λ based on an i.i.d. sample X 1,, X n. Exercise 4: Suppose that X 1,, X n form a random sample from a distribution for which the pdf f(x θ) is as follows: θx θ 1, for 0 < x < 1 0, for x 0 Also suppose that the value of θ is unknown (θ > 0). Find the MLE of θ. Exercise 5: Suppose that X 1,, X n form a random sample from a distribution for which the pdf f(x θ) is as follows: 1 2 e x θ for < x < Also suppose that the value of θ is unknown ( < θ < ). Find the MLE of θ. Exercise 6: Suppose that X 1,, X n form a random sample from a distribution for which the pdf f(x θ) is as follows: e θ x, for x > θ 0, for x θ Also suppose that the value of θ is unknown ( < θ < ). a) Show that the MLE of θ does not exist. b) Determine another version of the pdf of this same distribution for which the MLE of θ will exist, and find this estimate. Exercise 7: Suppose that X 1,, X n form a random sample from a uniform distribution on the interval (θ 1, θ 2 ), with the pdf as follows: 1 θ 2 θ 1, for θ 1 x θ 2 Also suppose that the values of θ 1 and θ 2 are unknown ( < θ 1 < θ 2 < ). Find the MLE s of θ 1 and θ 2.

### Practice problems for Homework 11 - Point Estimation

Practice problems for Homework 11 - Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:

### 1 Prior Probability and Posterior Probability

Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

### Exact Confidence Intervals

Math 541: Statistical Theory II Instructor: Songfeng Zheng Exact Confidence Intervals Confidence intervals provide an alternative to using an estimator ˆθ when we wish to estimate an unknown parameter

### Principle of Data Reduction

Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

### Sufficient Statistics and Exponential Family. 1 Statistics and Sufficient Statistics. Math 541: Statistical Theory II. Lecturer: Songfeng Zheng

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Sufficient Statistics and Exponential Family 1 Statistics and Sufficient Statistics Suppose we have a random sample X 1,, X n taken from a distribution

### 1 Sufficient statistics

1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

### Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

### Efficiency and the Cramér-Rao Inequality

Chapter Efficiency and the Cramér-Rao Inequality Clearly we would like an unbiased estimator ˆφ (X of φ (θ to produce, in the long run, estimates which are fairly concentrated i.e. have high precision.

### MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

### Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Using pivots to construct confidence intervals. In Example 41 we used the fact that

Using pivots to construct confidence intervals In Example 41 we used the fact that Q( X, µ) = X µ σ/ n N(0, 1) for all µ. We then said Q( X, µ) z α/2 with probability 1 α, and converted this into a statement

### INSURANCE RISK THEORY (Problems)

INSURANCE RISK THEORY (Problems) 1 Counting random variables 1. (Lack of memory property) Let X be a geometric distributed random variable with parameter p (, 1), (X Ge (p)). Show that for all n, m =,

### CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Some Continuous Probability Distributions CHAPTER 6: Continuous Uniform Distribution: 6. Definition: The density function of the continuous random variable X on the interval [A, B] is B A A x B f(x; A,

### Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8

α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =

### A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

12th International Congress on Insurance: Mathematics and Economics July 16-18, 2008 A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails XUEMIAO HAO (Based on a joint

### Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections - we are still here Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.5 The t Test 9.6 Comparing the

### 5. Continuous Random Variables

5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be

### NPTEL STRUCTURAL RELIABILITY

NPTEL Course On STRUCTURAL RELIABILITY Module # 02 Lecture 6 Course Format: Web Instructor: Dr. Arunasis Chakraborty Department of Civil Engineering Indian Institute of Technology Guwahati 6. Lecture 06:

### Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

### Section 5.1 Continuous Random Variables: Introduction

Section 5. Continuous Random Variables: Introduction Not all random variables are discrete. For example:. Waiting times for anything (train, arrival of customer, production of mrna molecule from gene,

### Hypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 One-Sided and Two-Sided Tests

Hypothesis Testing 1 Introduction This document is a simple tutorial on hypothesis testing. It presents the basic concepts and definitions as well as some frequently asked questions associated with hypothesis

### SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline

### Section 6.1 Joint Distribution Functions

Section 6.1 Joint Distribution Functions We often care about more than one random variable at a time. DEFINITION: For any two random variables X and Y the joint cumulative probability distribution function

### P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

### A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São

### Properties of Future Lifetime Distributions and Estimation

Properties of Future Lifetime Distributions and Estimation Harmanpreet Singh Kapoor and Kanchan Jain Abstract Distributional properties of continuous future lifetime of an individual aged x have been studied.

### Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...

Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................

### 2WB05 Simulation Lecture 8: Generating random variables

2WB05 Simulation Lecture 8: Generating random variables Marko Boon http://www.win.tue.nl/courses/2wb05 January 7, 2013 Outline 2/36 1. How do we generate random variables? 2. Fitting distributions Generating

### Final Mathematics 5010, Section 1, Fall 2004 Instructor: D.A. Levin

Final Mathematics 51, Section 1, Fall 24 Instructor: D.A. Levin Name YOU MUST SHOW YOUR WORK TO RECEIVE CREDIT. A CORRECT ANSWER WITHOUT SHOWING YOUR REASONING WILL NOT RECEIVE CREDIT. Problem Points Possible

### Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Monte Carlo Simulation: IEOR E4703 Fall 2004 c 2004 by Martin Haugh Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab 1 Overview of Monte Carlo Simulation 1.1 Why use simulation?

### Chapter 6: Point Estimation. Fall 2011. - Probability & Statistics

STAT355 Chapter 6: Point Estimation Fall 2011 Chapter Fall 2011 6: Point1 Estimat / 18 Chap 6 - Point Estimation 1 6.1 Some general Concepts of Point Estimation Point Estimate Unbiasedness Principle of

### Solution Using the geometric series a/(1 r) = x=1. x=1. Problem For each of the following distributions, compute

Math 472 Homework Assignment 1 Problem 1.9.2. Let p(x) 1/2 x, x 1, 2, 3,..., zero elsewhere, be the pmf of the random variable X. Find the mgf, the mean, and the variance of X. Solution 1.9.2. Using the

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

### Stats on the TI 83 and TI 84 Calculator

Stats on the TI 83 and TI 84 Calculator Entering the sample values STAT button Left bracket { Right bracket } Store (STO) List L1 Comma Enter Example: Sample data are {5, 10, 15, 20} 1. Press 2 ND and

### Reliability and Risk Analysis. Analysis of Model Tasks from Selected Application Areas Using a Computer (software R, Excel)

Reliability and Risk Analysis Analysis of Model Tasks from Selected Application Areas Using a Computer (software R, Excel) Task The aim of this study is to assess the wastewater treatment plant (WWTP)

### Test of Hypotheses. Since the Neyman-Pearson approach involves two statistical hypotheses, one has to decide which one

Test of Hypotheses Hypothesis, Test Statistic, and Rejection Region Imagine that you play a repeated Bernoulli game: you win \$1 if head and lose \$1 if tail. After 10 plays, you lost \$2 in net (4 heads

### Lecture Notes on Elasticity of Substitution

Lecture Notes on Elasticity of Substitution Ted Bergstrom, UCSB Economics 210A March 3, 2011 Today s featured guest is the elasticity of substitution. Elasticity of a function of a single variable Before

### Lecture 8. Confidence intervals and the central limit theorem

Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of

### [Chapter 10. Hypothesis Testing]

[Chapter 10. Hypothesis Testing] 10.1 Introduction 10.2 Elements of a Statistical Test 10.3 Common Large-Sample Tests 10.4 Calculating Type II Error Probabilities and Finding the Sample Size for Z Tests

### For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (1903-1987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll

### Change of Continuous Random Variable

Change of Continuous Random Variable All you are responsible for from this lecture is how to implement the Engineer s Way (see page 4) to compute how the probability density function changes when we make

### The Dirichlet-Multinomial and Dirichlet-Categorical models for Bayesian inference

The Dirichlet-Multinomial and Dirichlet-Categorical models for Bayesian inference Stephen Tu tu.stephenl@gmail.com 1 Introduction This document collects in one place various results for both the Dirichlet-multinomial

### Determining distribution parameters from quantiles

Determining distribution parameters from quantiles John D. Cook Department of Biostatistics The University of Texas M. D. Anderson Cancer Center P. O. Box 301402 Unit 1409 Houston, TX 77230-1402 USA cook@mderson.org

### 1 The Pareto Distribution

Estimating the Parameters of a Pareto Distribution Introducing a Quantile Regression Method Joseph Lee Petersen Introduction. A broad approach to using correlation coefficients for parameter estimation

### Dongfeng Li. Autumn 2010

Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

### Lecture 7: Continuous Random Variables

Lecture 7: Continuous Random Variables 21 September 2005 1 Our First Continuous Random Variable The back of the lecture hall is roughly 10 meters across. Suppose it were exactly 10 meters, and consider

### MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

### Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

### Lecture Notes on Elasticity of Substitution

Lecture Notes on Elasticity of Substitution Ted Bergstrom, UCSB Economics 20A October 26, 205 Today s featured guest is the elasticity of substitution. Elasticity of a function of a single variable Before

### Part 2: One-parameter models

Part 2: One-parameter models Bernoilli/binomial models Return to iid Y 1,...,Y n Bin(1, θ). The sampling model/likelihood is p(y 1,...,y n θ) =θ P y i (1 θ) n P y i When combined with a prior p(θ), Bayes

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability

CS 7 Discrete Mathematics and Probability Theory Fall 29 Satish Rao, David Tse Note 8 A Brief Introduction to Continuous Probability Up to now we have focused exclusively on discrete probability spaces

### Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

### Math 461 Fall 2006 Test 2 Solutions

Math 461 Fall 2006 Test 2 Solutions Total points: 100. Do all questions. Explain all answers. No notes, books, or electronic devices. 1. [105+5 points] Assume X Exponential(λ). Justify the following two

### Exponential Distribution

Exponential Distribution Definition: Exponential distribution with parameter λ: { λe λx x 0 f(x) = 0 x < 0 The cdf: F(x) = x Mean E(X) = 1/λ. f(x)dx = Moment generating function: φ(t) = E[e tx ] = { 1

### MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of slides,

### Math 319 Problem Set #3 Solution 21 February 2002

Math 319 Problem Set #3 Solution 21 February 2002 1. ( 2.1, problem 15) Find integers a 1, a 2, a 3, a 4, a 5 such that every integer x satisfies at least one of the congruences x a 1 (mod 2), x a 2 (mod

### 7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

### 3. The Multivariate Normal Distribution

3. The Multivariate Normal Distribution 3.1 Introduction A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis While real data

### Practice with Proofs

Practice with Proofs October 6, 2014 Recall the following Definition 0.1. A function f is increasing if for every x, y in the domain of f, x < y = f(x) < f(y) 1. Prove that h(x) = x 3 is increasing, using

### 1 Inner Products and Norms on Real Vector Spaces

Math 373: Principles Techniques of Applied Mathematics Spring 29 The 2 Inner Product 1 Inner Products Norms on Real Vector Spaces Recall that an inner product on a real vector space V is a function from

### Estimating an ARMA Process

Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators

### The Exponential Distribution

21 The Exponential Distribution From Discrete-Time to Continuous-Time: In Chapter 6 of the text we will be considering Markov processes in continuous time. In a sense, we already have a very good understanding

### Note on the EM Algorithm in Linear Regression Model

International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

### Joint Exam 1/P Sample Exam 1

Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### , for x = 0, 1, 2, 3,... (4.1) (1 + 1/n) n = 2.71828... b x /x! = e b, x=0

Chapter 4 The Poisson Distribution 4.1 The Fish Distribution? The Poisson distribution is named after Simeon-Denis Poisson (1781 1840). In addition, poisson is French for fish. In this chapter we will

### ( ) = P Z > = P( Z > 1) = 1 Φ(1) = 1 0.8413 = 0.1587 P X > 17

4.6 I company that manufactures and bottles of apple juice uses a machine that automatically fills 6 ounce bottles. There is some variation, however, in the amounts of liquid dispensed into the bottles

### Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

### Maximum Likelihood Estimation by R

Maximum Likelihood Estimation by R MTH 541/643 Instructor: Songfeng Zheng In the previous lectures, we demonstrated the basic procedure of MLE, and studied some examples. In the studied examples, we are

### Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

### 1.3. DOT PRODUCT 19. 6. If θ is the angle (between 0 and π) between two non-zero vectors u and v,

1.3. DOT PRODUCT 19 1.3 Dot Product 1.3.1 Definitions and Properties The dot product is the first way to multiply two vectors. The definition we will give below may appear arbitrary. But it is not. It

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA Csilla Csendes University of Miskolc, Hungary Department of Applied Mathematics ICAM 2010 Probability density functions A random variable X has density

### Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections 9.1 Problems of Testing Hypotheses Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.5 The t Test 9.6

### The Neyman-Pearson lemma. The Neyman-Pearson lemma

The Neyman-Pearson lemma In practical hypothesis testing situations, there are typically many tests possible with significance level α for a null hypothesis versus alternative hypothesis. This leads to

### Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents William H. Sandholm January 6, 22 O.. Imitative protocols, mean dynamics, and equilibrium selection In this section, we consider

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random

### 6. Distribution and Quantile Functions

Virtual Laboratories > 2. Distributions > 1 2 3 4 5 6 7 8 6. Distribution and Quantile Functions As usual, our starting point is a random experiment with probability measure P on an underlying sample spac

### Basic concepts and introduction to statistical inference

Basic concepts and introduction to statistical inference Anna Helga Jonsdottir Gunnar Stefansson Sigrun Helga Lund University of Iceland (UI) Basic concepts 1 / 19 A review of concepts Basic concepts Confidence

### Notes on the Negative Binomial Distribution

Notes on the Negative Binomial Distribution John D. Cook October 28, 2009 Abstract These notes give several properties of the negative binomial distribution. 1. Parameterizations 2. The connection between

### Estimating survival functions has interested statisticians for numerous years.

ZHAO, GUOLIN, M.A. Nonparametric and Parametric Survival Analysis of Censored Data with Possible Violation of Method Assumptions. (2008) Directed by Dr. Kirsten Doehler. 55pp. Estimating survival functions

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS Part 4: Geometric Distribution Negative Binomial Distribution Hypergeometric Distribution Sections 3-7, 3-8 The remaining discrete random

### Inner Product Spaces

Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

### Statistics 100A Homework 8 Solutions

Part : Chapter 7 Statistics A Homework 8 Solutions Ryan Rosario. A player throws a fair die and simultaneously flips a fair coin. If the coin lands heads, then she wins twice, and if tails, the one-half

### THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

### Definition and Properties of the Production Function: Lecture

Definition and Properties of the Production Function: Lecture II August 25, 2011 Definition and : Lecture A Brief Brush with Duality Cobb-Douglas Cost Minimization Lagrangian for the Cobb-Douglas Solution

### Inequalities of Analysis. Andrejs Treibergs. Fall 2014

USAC Colloquium Inequalities of Analysis Andrejs Treibergs University of Utah Fall 2014 2. USAC Lecture: Inequalities of Analysis The URL for these Beamer Slides: Inequalities of Analysis http://www.math.utah.edu/~treiberg/inequalitiesslides.pdf

### Probability Theory. Florian Herzog. A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T..

Probability Theory A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T.. Florian Herzog 2013 Probability space Probability space A probability space W is a unique triple W = {Ω, F,

### Schatten van de extreme value index voor afgeronde data (Engelse titel: Estimation of the extreme value index for imprecise data)

Technische Universiteit Delft Faculteit Elektrotechniek, Wiskunde en Informatica Delft Institute of Applied Mathematics Schatten van de extreme value index voor afgeronde data (Engelse titel: Estimation

### 6. Jointly Distributed Random Variables

6. Jointly Distributed Random Variables We are often interested in the relationship between two or more random variables. Example: A randomly chosen person may be a smoker and/or may get cancer. Definition.

### P(a X b) = f X (x)dx. A p.d.f. must integrate to one: f X (x)dx = 1. Z b

Continuous Random Variables The probability that a continuous random variable, X, has a value between a and b is computed by integrating its probability density function (p.d.f.) over the interval [a,b]: