Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 11 04/01/2008. Sven Zenker

Similar documents
Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker

Properties of MLE: consistency, asymptotic normality. Fisher information.

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Chapter 7 Methods of Finding Estimators

I. Chi-squared Distributions

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution


Overview of some probability distributions.

Maximum Likelihood Estimators.

Normal Distribution.

Modified Line Search Method for Global Optimization

LECTURE 13: Cross-validation

5: Introduction to Estimation

Confidence Intervals for One Mean

Output Analysis (2, Chapters 10 &11 Law)

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Department of Computer Science, University of Otago

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Chapter 14 Nonparametric Statistics

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

1. C. The formula for the confidence interval for a population mean is: x t, which was

Exploratory Data Analysis

Plug-in martingales for testing exchangeability on-line

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

Hypothesis testing. Null and alternative hypotheses

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Convexity, Inequalities, and Norms

1 Computing the Standard Deviation of Sample Means

A probabilistic proof of a binomial identity

Statistical inference: example 1. Inferential Statistics

Soving Recurrence Relations

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Incremental calculation of weighted mean and variance

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Measures of Spread and Boxplots Discrete Math, Section 9.4

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Sampling Distribution And Central Limit Theorem

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

arxiv: v1 [stat.me] 10 Jun 2015

Determining the sample size

Irreducible polynomials with consecutive zero coefficients

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

Sequences and Series

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

The Stable Marriage Problem

Lesson 17 Pearson s Correlation Coefficient

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis

Hypergeometric Distributions

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

1 The Gaussian channel

Quadrat Sampling in Population Ecology

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

Rainbow options. A rainbow is an option on a basket that pays in its most common form, a nonequally

Descriptive Statistics

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

Section 11.3: The Integral Test

Math C067 Sampling Distributions

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Universal coding for classes of sources

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

One-sample test of proportions

Lecture 4: Cheeger s Inequality

Partial Di erential Equations

Systems Design Project: Indoor Location of Wireless Devices

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

Chapter 7: Confidence Interval and Sample Size

Asymptotic Growth of Functions

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

INVESTMENT PERFORMANCE COUNCIL (IPC)

Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)

CHAPTER 3 THE TIME VALUE OF MONEY

How To Calculate A Radom Umber From A Probability Fuctio

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

Ekkehart Schlicht: Economic Surplus and Derived Demand

Lecture 5: Span, linear independence, bases, and dimension

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

CS103X: Discrete Structures Homework 4 Solutions

An Efficient Polynomial Approximation of the Normal Distribution Function & Its Inverse Function

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

NOTES ON PROBABILITY Greg Lawler Last Updated: March 21, 2016

Lecture 2: Karger s Min Cut Algorithm

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

MARTINGALES AND A BASIC APPLICATION

Uncertainty Chapter 13. Mausam (Based on slides by UW-AI faculty)

10-705/ Intermediate Statistics

INFINITE SERIES KEITH CONRAD

Entropy of bi-capacities

Transcription:

Parameter estimatio for oliear models: Numerical approaches to solvig the iverse problem Lecture 11 04/01/2008 Sve Zeker

Review: Trasformatio of radom variables Cosider probability distributio of a radom variable X described by PDF f ( x) defied o. How ca we fid the PDF f ( y) of a ew radom variable X we arrive at by applyig a ivertible fuctio T :, y = T( x) to the origial oe? Cosider probability of y beig i some subset S of Py ( S) f( ydy ), which we could compute if we kew f Y. = S Y Sice T is ivertible, we ca express the above itegral i terms of f ( x) as follows: Py ( S) = f( ydy ) = f ( xdx ) Y S 1 T ( S) ad chage variables to y to obtai 1 1 ( ) Y( ) X( ) X( ( )) det y ( ) S 1 T ( S) S so we see that 1 1 Y( ) X( ( )) det y ( ) X Py S = f ydy= f xdx= f T y DT ydy f y = f T y D T y Y X

Review: Expected value For a radom variable X described by a probability desity fuctio f EX ( ) = xf( xdx ) X ( x), the expected value is The expected value of a fuctio gx ( ) of the radom variable is EgX ( ( )) = gx ( ) f( xdx )

Review: Bayesia iferece for cotious variables f XY ( x y) = f ( y x) f ( x) Y X X f ( y x) f ( x) dx Probability desity fuctio of the posterior distributio

Review: Prior distributios Prior distributios have to be kow or assumed to be able to perform Bayesia iferece I the Bayesia spirit, they ca be used ecode prior beliefs about parameter/state values Approaches to objectifyig prior selectio by formally describig a state of miimal iformatio, typically based o ivariace argumets or iformatio theoretic ideas, exist (Jeffrey s prior, referece priors) I practice, if sufficiet data are available, the effect of prior selectio may be small This ca be explored experimetally

Samplig to tackle high dimesioal problems A way out: sample based approximatio... If we ca obtai a set of samples { X,..., X } from the posterior distributio π ( x), 1 E( f( x)) = f( x) π ( x) dx f( Xi ) i= 1 1

Review homework Key observatios: Product of margials equal to joit PDF iff radom variables are idepedet Sample based approximatio of margial PDFs works ad gets better with larger sample size Histogram ormalizatio factor: c bi = w bi bi N

From sample to PDF: Desity estimatio Histogram a istace of more geeral problem of desity estimatio: Give a fiite set of samples from a probability distributio, how ca I approximately fid the PDF of the uderlyig distributio? Histogram oe (relatively crude) way

Histogram: bi width selectio Various formulas based o asymptotic argumets exist. All of them (of course) have bi width decrease with icreasig umber of samples, ad also take ito accout some measure of the spread of the data... E.g.: 7s w = (Scott's rule) where umber of samples, s stadard deviatio of sample bi 2 1 3 or w bi = IQR 2 1 3 (Freedma-Diacois rule), where IQR is the iterquartile rage of the data These formulas give oe bi width for the etire dataset. A variety of methods exist to further refie the bi widths by allowig them to chage locally

Aother approach: Kerel desity estimatio Differet idea: rather tha coutig samples i a give rage, a probability desity bump is placed at the locatio of each sample (the kerel), the idea beig that the presece of a sample supports a higher desity aroud the positio of that sample. These kerels are all added up ad ormalized (source: Wikimedia commos, http://e.wikipedia.org/wiki/image:parze_widow_illustratio.pg)

Kerel desity estimatio (KDE) For a i.i.d. sample { x,..., x of the uderlyig distributio is give by 1 N }, the (fixed badwidth) kerel desity estimate for the PDF N 1 x x ˆ( ) ( i f x = K ), Nh h i= 1 where K( x) is a kerel fuctio satisfyig K( xdx ) = 1 Usually, oe will also require K( x) 0 ad K( x) = K(- x) for all x h is termed the badwith. Its selectio is critical for performace of the estimator ad much more importat tha the specific shape of the kerel used.

KDE: Effect of badwidth Source: Wikimedia Commos, http://e.wikipedia.org/wiki/image:kerel_desity.svg

KDE: Adaptive badwidth selectio Badwidth ca also be chose for each idividual sample separately, leadig to much improved estimates (lower desity -> wider kerels, higher desity -> > arrower kerels) The optimal badwidth selectio problem i higher dimesios is still a active area of research Importat idea: pealized MLE approaches (Why? Dirac catastrophe )

From PDF to sample: the samplig problem Wish to create set of i.i.d.. samples give a PDF kow up to a costat scalar multiple, e.g. to solve f XY ( x y) = f ( y x) f ( x) YX X f( y x) f( x) dx Deomiator (ormalizatio costat) usually ot kow

Markov Chai Mote Carlo While ideas like samplig from uiform distributio o [0,1] ad the trasformig with the iverse CDF work for simple PDFs kow i closed form, they are hopeless for realistic scearios The techiques we will discuss will allow us to obtai samples that, for a large umber of samples, are distributed accordig to the PDF we provide (our Bayesia posterior, e.g.) The idividual samples will ot be idepedet

Markov Chais for cotiuous state spaces Idea: defie a discrete time stochastic process o a cotious state space, that is, a descriptio of a particle movig aroud i our cotiuous state space i a radom fashio, e.g., i the followig way Trasitio kerel Px (, A) with x, A Σ, Σ Borel sigma field over (i.e., A is a subset of that is "ice eough") gives the probability that our stochastic process moves from x to a poit i the set A. P is a coditioal distributio fuctio givig the probability of movig to a certai subset of, give that we are at x. I particular, Px (, ) = 1, that is, we are certai to move somwhere... However, Px (,{ x}) eed ot be zero, that is, our chai ca stay where it is...

Ivariat distributios A large body of theory exists that treats 2 key questios arisig whe cosiderig the log term behavior of such Markov chais i cotiuous state spaces: * a) existece of ivariat distributio with *, that is ( ) ( ) satisfyig π π dy = π y dy * π ( dy) = P( x, dy) π( x) dx π desity wrt. Lebesgue measure that is, the ivariat distributio does ot chage whe the trasitio kerel acts o it... b) coditios uder which a iteratio of the trasitio kerel give by (, ) (, ) (, ) ( ) ( 1) P x A = P x dy P y A (1) ad (, ) (, ) P x dy = P x dy * coverges to the statioary distributio π

Markov Chai Mote Carlo Idea: Create a trasitio kerel that has the distributio we wish to sample from as ivariat distributio ad coverges to it Seems hard, but ca i fact be doe

Markov Chai Mote Carlo Suppose trasitio kerel takes form Pxdy (, ) = pxydy (, ) + rx ( ) δ ( dy) x where pxx (, ) = 0 ad δ ( dy) = 1 if x i dy ad 0 otherwise ad x rx ( ) 1 pxydy (, ) is the probability that = the chai remais at x.

Markov Chai Mote Carlo (MCMC) Now, if pxy (, ) satisfies the so-callled "detailed balace" coditio π( x) p( x, y) = π( y) p( y, x) the π ( )is the ivariat desity of Px (, ).

MCMC Evaluate RHS of statioarity coditio, usig assumed form of trasitio kerel ad liearity of itegratio (lie 1), switchig the order of itegratio (lie 2), ad usig the assumed detailed balace coditio (lie 3), to see that if p(x,y) i the proposed trasitio kerel satisfies detailed balace, \pi is ideed a statioary distributio of the Markov chai... From Chib et al., The America Statisticia 1995, Vol 49(4),327-335

How to arrive at a trasitio kerel that satisfies detailed balace We ca choose some arbitrary family of probability desities qxy (, ) that ca deped oly o the curret state (Markov property!). These will i geeral ot satisfy the detailed balace criterio eeded for the chai to have the desired target distributio π which we are aimig to sample from (e.g., our Bayesia posterior). For example, it could be that π( xqxy ) (, ) > π( yqyx ) (, ) So, roughly speakig, we move from x to y too ofte, ad ot frequetly eough from y to x. A coveiet way to compesate for this, which leads to Metropolis-Hastigs algorithms, is to itroduce a probability α ( xy, ) < 1 that the move from x to y is actually made. If the move is ot made, the process stays at x. So the the specific form for pxy (, ) i the Metropolis-Hastigs algorithm becomes pmh ( x, y) = q( x, y) α( x, y), x y For the case where the iequality is as above, we may wish to set α( yx, ) = 1, the largest possible value for a probability, ad ca the compute α( xy, ) from the detailed balace coditio π( xqxy ) (, ) α( xy, ) = π( yqyx ) (, ) α( y, x) to obtai π ( yqyx ) (, ) α( xy, ) =, π ( xqxy ) (, ) ad similarly for the iequality i the other directio by settig α( xy, ) = 1.

How to arrive at a trasitio kerel that satisfies detailed balace So to obtai detailed balace/reversibility, we chose π ( yqyx ) (, ) mi,1 if π ( xqxy ) (, ) > 0 α( xy, ) = π ( xqxy ) (, ) 1 otherwise which, together with the probability for stayig at the curret positio rx ( ) = 1 qxy (, ) α( xydy, ) yields a overall trasitio kerel which is a special case of the versio from the previous slide: P MH ( xdy, ) = qxy (, ) α( xydy, ) + 1 qxy (, ) α( xydy, ) δx( dy) for which we saw that it does have the desired ivariat distributio sice we have detailed balace/reversibility by costructio...

Observatios -For a symmetric proposal distributio, that is pxy (, ) = pyx (, ), π ( y) α( xy, ) =, so that "uphill moves" will always be accepted π ( x) (simulated aealig!) -For qxy (, ) = π( y), α( xy, ) = 1, so if the proposal distributio is the true distributio we wish to sample from, we will always accept the move -The PDF of the distributio of iterest π eed oly be kow up to a costat scalar factor sice it appears both i umerator ad deomiator

Algorithm Iitializatio Specify Family of proposal distributios q(x,y) Desired umber of samples N Iitial value x 0 Mai loop Repeat for j=1, N Geerate (sample) y from q(x j,.) ad u from uiform distributio o [0,1] If u <= α(x j,y) Set x j+1 = y Else Set x j+1 =x j Ed Repeat Termiatio Retur the set of samples {x 1,,x N }

Assigmet o. 9 Implemet the described Metropolis-Hastigs samplig algorithm for a geeric family of symmetric proposal distributio q(x,y) ) ad target desity i MATLAB the fuctio header could for example look like this: fuctio [samples, accept] ] = mh_symm(samples,, x0, sampleq,, des) % mh_symm implemets a simple Metropolis-Hastigs sampler for the PDF des usig the % symmetric proposal distributio from which sampleq samples % samples - umber of iteratios/correlated samples to be draw % x0 - colum vector givig iitial positio i parameter space % sampleq - fuctio sampleq(x) ) should retur sample from proposal % distributio for curret positio x % des - fuctio des(x) ) should retur desity of PDF correspodig % to distributio we wish to sample % returs % samples - legth(x0) x iter array of samples % accept - umber of accepted steps Explore its behavior by utilizig the 2-dimesioal 2 test distributios provided for homework o. 7 as target distributios ad ruig the sampler usig symmetric bivariate ormal distributios with idetical margial stadard deviatios \sigma as proposal distributio (you ca sample from these by simply samplig from the margials usig rad,, thaks to idepedece ). Ru the sampler for 100000 iteratios for proposal sigma = 0.01, 0.1, 1, 10, 15, 20, ad 40, respectively, usig [0;0] for x0 every time for comparability for both distributios Compute the acceptace ratio, that is (o. of accepted steps)/(total umber of iteratios), for each value of sigma. What do you observe? Use your (corrected, if ecessary) ormalized histogram code from m homework o. 7 to compare the estimated margial desities you obtai from each sample with the e true margial desities. What do you observe for the differet choices of sigma for the 2 distributios? How do you iterpret your overall fidigs?