Maximum Likelihood Estimation by R



Similar documents
Maximum Likelihood Estimation

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Lecture 3: Linear methods for classification

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression (1/24/13)

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Descriptive Statistics


12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Multivariate Normal Distribution

Statistical Machine Learning

Package EstCRM. July 13, 2015

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Exact Confidence Intervals

Wes, Delaram, and Emily MA751. Exercise p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].

Lecture 5 : The Poisson Distribution

Gamma Distribution Fitting

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

1 Prior Probability and Posterior Probability

Lecture 19: Conditional Logistic Regression

Notes on the Negative Binomial Distribution

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

CHAPTER 2 Estimating Probabilities

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Penalized Logistic Regression and Classification of Microarray Data

7. Solving Linear Inequalities and Compound Inequalities

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Regression III: Advanced Methods

Introduction to Logistic Regression

Software for Distributions in R

From the help desk: Demand system estimation

14. Nonlinear least-squares

Package dsmodellingclient

GLM, insurance pricing & big data: paying attention to convergence issues.

INSURANCE RISK THEORY (Problems)

Dongfeng Li. Autumn 2010

From the help desk: hurdle models

Computer exercise 4 Poisson Regression

TEST 2 STUDY GUIDE. 1. Consider the data shown below.

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

Some Problems of Second-Order Rational Difference Equations with Quadratic Terms

The Method of Least Squares. Lectures INF2320 p. 1/80

Exam C, Fall 2006 PRELIMINARY ANSWER KEY

Time Series Analysis

Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma

1 Maximum likelihood estimation

Chapter 3 RANDOM VARIATE GENERATION

Logit Models for Binary Data

Chapter 4 One Dimensional Kinematics

A few useful MATLAB functions

Machine Learning and Pattern Recognition Logistic Regression

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Handling missing data in Stata a whirlwind tour

A Comparison of Nonlinear. Regression Codes

Review of Fundamental Mathematics

Chapter 4 Online Appendix: The Mathematics of Utility Functions

Questions and Answers

Polynomial Neural Network Discovery Client User Guide

Opgaven Onderzoeksmethoden, Onderdeel Statistiek

Fixed Point Theorems

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

Numerical Methods in MATLAB

Univariate Regression

PS 271B: Quantitative Methods II. Lecture Notes

Practice problems for Homework 11 - Point Estimation

Package MDM. February 19, 2015

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

Basics of Statistical Machine Learning

Simple Regression Theory II 2010 Samuel L. Baker

A mixture model for random graphs

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

STA 4273H: Statistical Machine Learning

Homework 2. Page 154: Exercise Page 145: Exercise 8.3 Page 150: Exercise 8.9

M1 in Economics and Economics and Statistics Applied multivariate Analysis - Big data analytics Worksheet 1 - Bootstrap

R: A self-learn tutorial

Final Review Ch. 1 #2

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Numerical methods for American options

The equivalence of logistic regression and maximum entropy models

Bootstrap Example and Sample Code

Empirical Project, part 2, ECO 672, Spring 2014

Simple linear regression

MATHEMATICS FOR ENGINEERING BASIC ALGEBRA

Linear Equations and Inequalities

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment Statistics and Probability Example Part 1

Lecture 5 Principal Minors and the Hessian

Basics of STATA. 1 Data les. 2 Loading data into STATA

Transcription:

Maximum Likelihood Estimation by R MTH 541/643 Instructor: Songfeng Zheng In the previous lectures, we demonstrated the basic procedure of MLE, and studied some examples. In the studied examples, we are lucky that we can find the MLE by solving equations in closed form. But life is never easy. In applications, we usually don t have closed form solutions due to the complicated probability distribution or the nonlinearity of the equations obtained from maximum likelihood principles. In these situations, we can use a computer to solve the problem. Many statistics software package has MLE as a standard procedure, but for the purpose of learning MLE and for the purpose of learning programming language, let us develop the code ourselves. In this course, we use R for our computer programming. R is free software, and you can download it from http://www.r-project.org/. R is one of the most widely used software packages in the statistics community these days, together with SAS, SPSS, SPLUS, STATA, among others. We will introduce the R programming for MLE via an example: The Poisson distribution has been used by traffic engineers as a model for light traffic, based on the rationale that if the rate is approximately constant and the traffic is light (so the individual cars move independently of each other), the distribution of counts of cars in a given time interval or space area should be nearly Poisson (Gerlough and Schuhl 1955). The following table shows the number of right turns during 300 3-minute intervals at a specific intersection. n frequency 0 14 1 30 2 36 3 68 4 43 5 43 6 30 7 14 8 10 9 6 10 4 11 1 12 1

13+ 0 If we suppose Poisson model might be a good model for this dataset, we still need to find out which Poisson, that is estimate the parameter λ in the Poisson model: λ x e λ PX ( = x) =. x! Of course, we can use the formula to calculate MLE of the parameter λ in the Poisson model as: ˆ λ = X (please check this yourselves.) For the purpose of demonstrating the use of R, let us just use this Poisson distribution as an example. The first step is of course, input the data. If the data are stored in a file (*.txt, or in excel format), we can directly read the data from file. In this example, we can input the data directly: X <- c(rep(0,14), rep(1,30), rep(2,36), rep(3,68), rep(4, 43), rep(5,43), rep(6, 30), rep(7,14), rep(8,10), rep(9, 6), rep(10,4), rep(11,1), rep(12,1)) Here rep(x, n) means repeat the number x for n times, and c( ) is the operation to combine all the numbers into a long vector. We can also plot out the histogram of the data, and compare to the point mass function of Poisson, to gain a rough impression whether a Poisson model is good or not. hist(x, main="histogram of number of right turns", right=false, prob=true) and the following is the graph for histogram:

If we believe the Poisson model is good for the data, we need to estimate the parameter. Let s first get the size of the sample by using the following command: n <- length(x) In order to obtain the MLE, we need to maximize the likelihood function or log likelihood function. The R package provides a function which can minimize an object function, therefore, we can define the negative log likelihood function as follows: negloglike<-function(lam) { n* lam -sum(x) *log(lam) + sum(log(factorial(x))) } Here the first line negloglike<-function(lam) defines a function, and the function name is called negloglike, and the input parameter of the function is lam, function is a keyword in R, which means the following is a function. The argument lam can also be a vector in the case of more than two unknown parameters (see the exercise). The statement inside the bracket is the body of the function. Please note that, there should not be any unknown values in the function body except for the input parameter lam. For example, here, you know X, and n. Once the function is defined in R, you can evaluate the function value by giving it a value for lam. For example, you can type in negloglike(0.3) to find the negative log likelihood at λ = 0.3. After we define the negative log likelihood, we can perform the optimization as following: out<-nlm(negloglike,p=c(0.5), hessian = TRUE) here nlm is the nonlinear minimization function provided by R, and the first argument is the object function to be minimized; the second argument p=c(0.5), specifies the initial value of the unknown parameter, here we start at 0.5; the third argument, hessian = TRUE says that we want the Hessian matrix as a part of the output result. The optimization result and relevant information will be stored in a structure, and in this example, the variable out stores the structure. You may get some warning messages, like the following, which does not matter: Warning messages: 1: In log(p) : NaNs produced 2: In nlm(fn, p = c(0.5), hessian = TRUE) : NA/Inf replaced by maximum positive value 3: In log(p) : NaNs produced 4: In nlm(fn, p = c(0.5), hessian = TRUE) :

NA/Inf replaced by maximum positive value If you want to see the result, just type in the variable name out in this case, and you will have the following information: out $minimum [1] 667.183 $estimate [1] 3.893331 $gradient [1] -2.575476e-05 $hessian [,1] [1,] 77.03948 $code [1] 1 $iterations [1] 10 Where, out$minimum is the negative log-likelihood, out$estimate are the maximum likelihood estimates of the parameters, out$gradient is the gradient of the negative loglikelihood function at this point, out$hessian is the value of the second derivative at this point, out$iterations is the number of iterations need to converge. The notation $ is to take the component of the output variable out. For this Poisson distribution, it is well-known that the MLE is the mean value of the values, type in command to find the mean value: mean(x) [1] 3.893333 Which is very close to the result from the code.

Exercise: (Please fit a gamma distribution, plot the graphs, turn in the results and code! Please type your solution to a file.) The file gamma-arrivals.txt contains another set of gamma-ray data, this one consisting of the times between arrivals (inter-arrival times) of 3935 photons (units are seconds). Assume the Gamma distribution is a good model for the data: β f x αβ x e x Γ ( α) α α 1 βx (, ) =, for 0 where both alpha and beta are unknown. 1. Make a histogram of the inter-arrival times. Does it appear that a gamma distribution would be a plausible model? 2. Fit the parameters by the method of moments and maximum likelihood. How do the estimate compare? 3. Plot the two fitted gamma densities on top of the histogram. Do the fits look reasonable? Notes: a. in order to read the data, if you are connected to Internet, you need the following: x = scan( "http://people.missouristate.edu/songfengzheng/teaching/mth541/gammaarrivals.txt") The url can be replaced by the file path, if desired. For example: x = (scan( " C:\\TEACHING\\MATH 541\\Data_Rice\\ASCII Comma\\Chapter 8\\gamma-arrivals.txt ")) This command also has the advantage of importing the data as a vector instead of a data frame, making the subsequent conversion unnecessary. b. You don t need to use the function nlm to do Method of Moment, instead, you need to derive the formula for the estimator, and write the code to calculate directly. c. to plot the density of a distribution, use the following: curve(dgamma(x, shape=alpha_hat_mle, scale=1.0/beta_hat_mle)) d. to put the density on top of the histogram, you can use the following: curve(dgamma(x, shape=alpha_hat_mle, scale=1.0/beta_hat_mle),col='blue',add=true)