APPLICATIONS OF BAYES THEOREM



Similar documents
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Linear Classification. Volker Tresp Summer 2015

Lecture 9: Introduction to Pattern Analysis

Linear Models for Classification

Christfried Webers. Canberra February June 2015

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

1 Prior Probability and Posterior Probability

Statistical Machine Learning

Lecture 3: Linear methods for classification

Probability and Random Variables. Generation of random variables (r.v.)

Linear Threshold Units

E3: PROBABILITY AND STATISTICS lecture notes

INTRODUCTION TO GEOSTATISTICS And VARIOGRAM ANALYSIS

Classification by Pairwise Coupling

Chapter 4. Probability and Probability Distributions

Maximum Likelihood Estimation

Basics of Statistical Machine Learning

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

Continued Fractions and the Euclidean Algorithm

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

Building the Wireline Database and Calculation of Reservoir Porosity

Multivariate Statistical Inference and Applications

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

Study Guide 2 Solutions MATH 111

The Method of Partial Fractions Math 121 Calculus II Spring 2015

8.7 Exponential Growth and Decay

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

M2S1 Lecture Notes. G. A. Young ayoung

Multivariate normal distribution and testing for means (see MKB Ch 3)

Introduction to Matrix Algebra

Markov Chain Monte Carlo Simulation Made Simple

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Classification Problems

MATH2210 Notebook 1 Fall Semester 2016/ MATH2210 Notebook Solving Systems of Linear Equations... 3

Statistics Graduate Courses

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Mud logging, also known as hydrocarbon well logging, is the creation of a detailed record (well

Introduction to General and Generalized Linear Models

2 Binomial, Poisson, Normal Distribution

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Algebra 2 Year-at-a-Glance Leander ISD st Six Weeks 2nd Six Weeks 3rd Six Weeks 4th Six Weeks 5th Six Weeks 6th Six Weeks

Statistical Functions in Excel

Analysis of GS-11 Low-Resistivity Pay in Main Gandhar Field, Cambay Basin, India A Case Study

Math Common Core Sampler Test

Roots of Equations (Chapters 5 and 6)

MATHS LEVEL DESCRIPTORS

Message-passing sequential detection of multiple change points in networks

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

CSI:FLORIDA. Section 4.4: Logistic Regression

FINAL EXAM SECTIONS AND OBJECTIVES FOR COLLEGE ALGEBRA

6.4 Logarithmic Equations and Inequalities

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Simplify the rational expression. Find all numbers that must be excluded from the domain of the simplified rational expression.

L4: Bayesian Decision Theory

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Multivariate Normal Distribution

Algebra 2 Unit 8 (Chapter 7) CALCULATORS ARE NOT ALLOWED

Chapter 6. Orthogonality

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Machine Learning and Pattern Recognition Logistic Regression

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Numerical methods for American options

How To Understand The Theory Of Probability

Precalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES

Examples of Functions

Exam C, Fall 2006 PRELIMINARY ANSWER KEY

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Linearly Independent Sets and Linearly Dependent Sets

Tight Gas Reservoir Characterization

Some probability and statistics

The Method of Least Squares. Lectures INF2320 p. 1/80

Algebra II. Weeks 1-3 TEKS

Bayesian Model Averaging Continual Reassessment Method BMA-CRM. Guosheng Yin and Ying Yuan. August 26, 2009

Sections 2.11 and 5.8

5. Orthogonal matrices

Machine Learning.

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Feature Commercial codes In-house codes

12. THE SPECTRAL GAMMA RAY LOG 12.1 Introduction Principles

Poisson Models for Count Data

Probability for Estimation (review)

y intercept Gradient Facts Lines that have the same gradient are PARALLEL

7.2.4 Seismic velocity, attenuation and rock properties

Bayesian Classifier for a Gaussian Distribution, Decision Surface Equation, with Application

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization

Search and Discovery Article #40256 (2007) Posted September 5, Abstract

Chapter 3 RANDOM VARIATE GENERATION

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Metric Spaces. Chapter Metrics

Solving Quadratic & Higher Degree Inequalities

Zeros of Polynomial Functions

Inner Product Spaces

An Introduction to Machine Learning

Penalized Splines - A statistical Idea with numerous Applications...

Two-sample hypothesis testing, II /16/2004

BEST METHODS FOR SOLVING QUADRATIC INEQUALITIES.

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification

Transcription:

ALICATIONS OF BAYES THEOREM C&E 940, September 005 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-093 Notes, overheads, Excel example file available at http://people.ku.edu/~gbohling/cpe940

Development of Bayes Theorem Terminology: (A): robability of occurrence of event A (marginal) (B): robability of occurrence of event B (marginal) (A,B): robability of simultaneous occurrence of events A and B (joint) (A B): robability of occurrence of A given that B has occurred (conditional) (B A): robability of occurrence of B given that A has occurred (conditional) Relationship of joint probability to conditional and marginal probabilities: ( A, B) ( A B) ( B) or ( A, B) = ( B A) ( A) = So... (A B)(B) = (B A)(A)

Rearranging gives simplest statement of Bayes theorem: ( B A) = ( A B) ( B) ( A) Often, B represents an underlying model or hypothesis and A represents observable consequences or data, so Bayes theorem can be written schematically as ( model data) ( data model) ( model) This lets us turn a statement about the forward problem: (data model): probability of obtaining observed data given certain model into statements about the corresponding inverse problem: (model data): probability that certain model gave rise to observed data as long as we are willing to make some guesses about the probability of occurrence of that model, (model), prior to taking the data into account.

Or graphically, Bayes theorem lets us turn information about the probability of different effects from each possible cause: into information about the probable cause given the observed effects: (Illustration styled after Sivia, 996, Figure.)

Assume that B i represents one of n possible mutually exclusive events and that the conditional probability for the occurrence of A given that B i has occurred is (A B i ). In this case, the total probability for the occurrence of A is ( A) n = i= ( A B ) ( ) i B i and the conditional probability that event B i has occurred given that event A has been observed to occur is given by ( B A) ( A B ) ( B ) i i i = = n j= ( A B ) ( B ) j j ( AB ) ( B ) i ( A) i. That is, if we assume that event A arises with probability (A B i ), from each of the underlying states B i, i=,,n, we can use our observation of the occurrence of A to update our a priori assessment of the probability of occurrence of each state, (B i ), to an improved a posteriori estimate, (B i A).

Discrete-robability Example: Dolomite/Shale Discrimination Using Gamma Ray Log Threshold Reservoir with dolomite pay zones and shale non-pay zones. Gamma ray log: Measures natural radioactivity of rock; measured in AI units Shales: Typically high gamma ray (~0 AI units) due to abundance of radioactive isotopes in clay minerals; somewhat lower in this reservoir (~80 AI units) due to high silt content Dolomite: Typically low gamma ray (~0-5 AI units), but some hot intervals due to uranium Can characterize gamma ray distribution for each lithology based on core samples from wells in field: Dolomite Shale Mean 5.8 85. Std. Dev. 8.6 4.9 Count 476 95

Gamma ray distributions for dolomite and shale 0.04 dolomite robability Density 0.03 0.0 shale 0.0 0.00 0 0 40 60 80 00 0 40 60 Gamma Ray (AI Units) Will use these distributions to predict lithology from gamma ray in uncored wells, first using a simple rule: - if GammaRay > 60, call the logged interval a shale - if GammaRay < 60, call it a dolomite

Using Bayes rule we can determine the posterior probability of occurrence of dolomite and shale given that we have actually observed a gamma ray value greater than 60. Let s define events & probabilities as follows: A: GammaRay > 60 B : occurrence of dolomite B : occurrence of shale (B ): prior probability for dolomite based on overall prevalence 60% (476 of 77 core samples) (B ): prior probability for shale based on overall prevalence 40% (95 of 77 core samples) (A B ): probability of GammaRay > 60 in a dolomite = 7% (34 of 476 dolomite samples) (A B ): probability of GammaRay > 60 in a shale = 95% (80 of 95 shale samples) Then the denominator in Bayes theorem, the total probability of A, is given by ( A) = ( A B ) ( B ) + ( A B ) ( B ) = 0.07*0.60 + 0.95* 0.40 = 0.4

If we measure a gamma ray value greater than 60 at a certain depth in a well, then the probability that we are logging a dolomite interval is ( B A) ( A B ) ( B ) ( A) 0.07*0.60 0.4 = = = 0.0 and the probability that we are logging a shale interval is ( B A) ( AB ) ( B ) ( A) 0.95* 0.40 0.4 = = = 0.90. Thus, our observation of a high gamma ray value has changed our assessment of the probabilities of occurrence of dolomite and shale from 60% and 40%, based on our prior estimates of overall prevalence, to 0% and 90%. We can do simple sensitivity analysis with respect to prior probabilities. For example, if we take prior probability for shale to be 0% (meaning prior for dolomite is 80%), then get posterior probability of 77% for shale (3% for dolomite) if the gamma ray value is greater than 60 AI units.

Continuous-robability Example: Dolomite/Shale Discrimination Using Gamma Ray Density Functions It is also possible to formulate Bayes theorem using probability density functions in place of the discrete probabilities (A B i ). We could represent the probability density function that a continuous variable, X, follows in each case as f(x B i ) or, more compactly, f i (x). Then ( B x) i = n f j= i f ( x) ( B ) j ( x) ( B ) i j. That is, if we can characterize the distribution of X for each category, B i, we can use the above equation to compute the probability that event B i has occurred given that the observed value of X is x. For example, based on the observed distribution of gamma ray values for dolomites and shales, a gamma ray measurement of 0 AI units almost certainly arises from a shale interval, because the probability density function for gamma ray in dolomites evaluated at 0 AI units, f (x=0), is essentially 0. This form of Bayes theorem lets us develop a continuous mapping from gamma ray value to posterior probability.

Shale/Dolomite Discrimination Using Normal Density Functions Dolomite () Shale () Mean ( x ) 5.8 85. Std. Dev. (s) 8.6 4.9 Count 476 95 f f [ s ] ( x) = exp ( x x ) s π [ s ] ( x) = exp ( x x ) s π Normal Approximations for Gamma Ray Distributions 0.04 Kernel density estimate Normal density estimate 0.03 robability Density 0.0 dolomite shale 0.0 0.00 5 30 55 80 05 30 55 Gamma Ray (AI Units)

Let Let q = (B ) represent prior probability for shale prior for dolomite is then (B ) = - q p (x) = (B x) represent posterior probability for shale posterior for dolomite is then (B x) = - p (x) So, posterior probability for shale given that the observed gamma ray value = x is p ( x) = q f( x) ( q ) f ( x) + q f ( x) Shale Occurrence robability Using Normal Densities. 0.05.0 osterior robability for Shale (solid lines) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0. normal pdf for dolomite 0.6 0.4 0. prior probability for shale used to compute posterior normal pdf for shale 0.04 0.03 0.0 0.0 robability Density (dashed lines) 0. 0.0 59.6 0 50 00 50 Gamma Ray (AI Units) 0.00

Bayes rule allocation: Assign observation to class with highest posterior probability. For base case prior of 40% for shale, 50% posterior probability point occurs at gamma ray of 59.6 so Bayes rule allocation leads to basically same results as thresholding at 60 AI units. But now have means for converting gamma ray to continuous shale probability log.

Shale/Dolomite Discrimination Using Kernel Density Estimates No need to restrict approach to just normal densities. Could use any other form of probability density function for each category, including the kernel density estimates shown initially: Shale Occurrence robability Using Kernel Densities 0.05 osterior robability for Shale (solid lines).0 0.8 0.6 0.4 0. kernel pdf for sandstone 0.6 0.4 prior probability for shale used 0. to compute posterior kernel pdf for shale 0.04 0.03 0.0 0.0 robability Density (dashed lines) 0.0 0 50 00 50 Gamma Ray (AI Units) 0.00

Relationship to Discriminant Analysis Could just as easily use multivariate density functions in Bayes theorem. For example, could be discriminating facies based on a vector of log measurements, x, rather than a single log. If use multivariate normal density functions for each class, Bayes rule allocation leads to classical discriminant analysis. Assuming covariance matrices all equal for different classes leads to linear discriminant analysis: Bayes rule allocation draws linear boundaries between classes in x space. Assuming unequal covariance matrices leads to quadratic discriminant analysis: Bayes rule allocation draws quadratic boundaries between classes.