Probability and statistics; Rehearsal for pattern recognition

Similar documents

E3: PROBABILITY AND STATISTICS lecture notes

Introduction to Probability

Chapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses

AMS 5 CHANCE VARIABILITY

Probability for Estimation (review)

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Chapter 4 Lecture Notes

A Tutorial on Probability Theory

An Introduction to Basic Statistics and Probability

Exploratory Data Analysis

THE CENTRAL LIMIT THEOREM TORONTO

Probability Theory. Florian Herzog. A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T..

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Introductory Probability. MATH 107: Finite Mathematics University of Louisville. March 5, 2014

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION

Machine Learning.

Probability and Expected Value

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Joint Exam 1/P Sample Exam 1

MA 1125 Lecture 14 - Expected Values. Friday, February 28, Objectives: Introduce expected values.

Fuzzy Probability Distributions in Bayesian Analysis

INTRODUCTORY SET THEORY

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Probabilities. Probability of a event. From Random Variables to Events. From Random Variables to Events. Probability Theory I

Nonparametric adaptive age replacement with a one-cycle criterion

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

1 Prior Probability and Posterior Probability

MATHEMATICAL METHODS OF STATISTICS

2.6. Probability. In general the probability density of a random variable satisfies two conditions:

Basic Probability Concepts

Basic Probability. Probability: The part of Mathematics devoted to quantify uncertainty

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Chapter 5. Discrete Probability Distributions

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

What Is Probability?

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Chapter 4. Probability and Probability Distributions

Random Variate Generation (Part 3)

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Statistics 100A Homework 4 Solutions

International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics

STA 371G: Statistics and Modeling

Lecture 13. Understanding Probability and Long-Term Expectations

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Sums of Independent Random Variables

Law of Large Numbers. Alexandra Barbato and Craig O Connell. Honors 391A Mathematical Gems Jenia Tevelev

Elementary Statistics and Inference. Elementary Statistics and Inference. 16 The Law of Averages (cont.) 22S:025 or 7P:025.

How To Find The Sample Space Of A Random Experiment In R (Programming)

Statistics Review PSY379

In the situations that we will encounter, we may generally calculate the probability of an event

Statistics and Random Variables. Math 425 Introduction to Probability Lecture 14. Finite valued Random Variables. Expectation defined

( ) is proportional to ( 10 + x)!2. Calculate the

6.042/18.062J Mathematics for Computer Science. Expected Value I

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

APPLIED MATHEMATICS ADVANCED LEVEL

How to Gamble If You Must

M2S1 Lecture Notes. G. A. Young ayoung

Collatz Sequence. Fibbonacci Sequence. n is even; Recurrence Relation: a n+1 = a n + a n 1.

Lecture 6: Discrete & Continuous Probability and Random Variables

V. RANDOM VARIABLES, PROBABILITY DISTRIBUTIONS, EXPECTED VALUE

Chapter 4. Probability Distributions

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

Math 55: Discrete Mathematics

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

Inaugural Lecture. Jan Vecer

Lecture 8. Confidence intervals and the central limit theorem

Probability Generating Functions

Sample Space and Probability

STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Wald s Identity. by Jeffery Hein. Dartmouth College, Math 100

MAS131: Introduction to Probability and Statistics Semester 1: Introduction to Probability Lecturer: Dr D J Wilkinson

MULTIVARIATE PROBABILITY DISTRIBUTIONS

The Kelly Betting System for Favorable Games.

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

Lecture 7: Continuous Random Variables

Gambling and Data Compression

Stochastic Processes and Queueing Theory used in Cloud Computer Performance Simulations

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

Mathematical Expectation

Stat 20: Intro to Probability and Statistics

Bayes and Naïve Bayes. cs534-machine Learning

Lecture 13: Martingales

Responsible Gambling Education Unit: Mathematics A & B

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability

Martingale Ideas in Elementary Probability. Lecture course Higher Mathematics College, Independent University of Moscow Spring 1996

Introduction to Engineering System Dynamics

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Master s Theory Exam Spring 2006

Introduction to Discrete Probability. Terminology. Probability definition. 22c:19, section 6.x Hantao Zhang

Recursive Estimation

Economics 1011a: Intermediate Microeconomics

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Transcription:

Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception http://cmp.felk.cvut.cz/ hlavac, hlavac@fel.cvut.cz Courtesy: T. Brox, V. Franc, M. Navara, M. Urban. Probability vs. statistics. Random events. Outline of the talk: Bayes theorem. Distribution function, density. Probability, joint, conditional. Characteristics of a random variable.

Recommended reading 2/29 A. Papoulis: Probability, Random Variables and Stochastic Processes, McGraw Hill, Edition 4, 2002. http://mathworld.wolfram.com/ http://www.statsoft.com/textbook/stathome.html

Probability, motivating example A lottery ticket is sold for the price EUR 2. 3/29 1 lottery ticket out of 1000 wins EUR 1000. Other lottery tickets win nothing. This gives the value of the lottery ticket after the draw. For what price should the lottery ticket be sold before the draw? Only a fool would by the lottery ticket for EUR 2. (Or not?) 1 The value of the lottery ticket before the draw is 1000 1000 = EUR 1 = the average value after the draw. The probability theory is used here.. A lottery question: Why are the lottery tickets being bought? Why do lotteries prosper?

Statistics, motivating example 4/29 We have assumed so far that the parameters of the probability model are known. However, this is seldom fulfilled. Example Lotto: One typically looses while playing Lotto because the winnings are set according to the number of winners. It is of advantage to bet differently than others. For doing so, it is needed what model do the other use. Example Roulette: Both parties are interested if all the numbers occur with the same probability. More precisely said, what are the differences from the uniform probability distribution. How to learn it? What is the risk of wrong conclusions? Statistics is used here.

Probability, statistics Probability: probabilistic model = future behavior. It is a theory (tool) for purposeful decisions when the outcome of future events depends on circumstances we know only partially and the randomness plays a role. An abstract model of uncertainty description and quantification of the results. Statistics: behavior of the system = probabilistic representation. It is a tool for seeking a probabilistic description of real systems based on observing them and testing them. It provides more: a tool for investigating the world, seeking and testing dependencies which are not apparent. Two types: descriptive and inference statistics. Collection, organization and analysis of data. Generalization from restricted / finite samples. 5/29

Random events, concepts 6/29 An experiment with random outcome states of nature, possibilities, experimental results, etc. A sample space is an nonempty set Ω of all possible outcomes of the experiment. An elementary event ω Ω are elements of the sample space (outcomes of the experiment). A space of events A is composed of the system of all subsets of the sample space Ω. A random event A A is and element of the space of events. Note: The concept of a random event was introduced in order to be able to define the probability, probability distribution, etc.

Probability, introduction 7/29 Classic. P.S. Laplace, 1812. It is not regarded to be the definition of the probability any more. It is merely an estimate of the probability. P (A) N A N Limit (frequency) definition P (A) = lim N N A N 0.2 0.15 0.1 0.05 0 5 0 5 z histogram vs. continuous pdf Axiomatic definition (Andrey Kolmogorov 1930)

Axiomatic definition of the probability 8/29 Ω - the sample space. A - the space of events. 1. P (A) 0, A A. 2. P (Ω) = 1. 3. If A B = then P (A B) = P (A) + P (B), A A, B B.

Probability is a function P, which assigns number from the interval 0, 1 to events and fulfils the following two conditions: 9/29 P (true) = 1, ( ) P A n n N exclusive. = n N P (A n ), if the events A n, n N, are mutually From these conditions, it follows: P (false) = 0, P ( A) = 1 P (A), if A B then P (A) P (B). Note: Strictly speaking, the space of events have to fulfil some additional conditions.

Derived relations 10/29 If A B then P (B \ A) = P (B) P (A). The symbol \ denotes the set difference. P (A B) = P (A) + P (B) P (A B).

Joint probability, marginalization 11/29 The joint probability P (A, B), also sometimes denoted P (A B), is the probability that events A, B co-occur. The joint probability is symmetric: P (A, B) = P (B, A). Marginalization (the sum rule): P (A) = P (A, B) allows computing the B probability of a single event by summing the joint probabilities over all possible events B. The probability P (A) is called the marginal probability.

Contingency table, marginalization Example: orienteering race 12/29 Orienteering competition example, participants Age <= 15 16-25 26-35 36-45 46-55 56-65 66-75 >= 76 Sum Men 22 36 45 33 29 21 12 2 200 Women 19 32 37 30 23 14 5 0 160 Sum 41 68 82 63 52 35 17 2 360 Orienteering competition example, frequency Age <= 15 16-25 26-35 36-45 46-55 56-65 66-75 >= 76 Sum Men 0,061 0,100 0,125 0,092 0,081 0,058 0,033 0,006 0,556 Women 0,053 0,089 0,103 0,083 0,064 0,039 0,014 0,000 0,444 Sum 0,114 0,189 0,228 0,175 0,144 0,097 0,047 0,006 1 0,300 Marginal probability P(sex) 0,200 0,100 0,000 1 2 3 4 5 6 7 8 Marginal probability P(Age_group)

The conditional probability Let us have the probability representation of a system given by the joint probability P (A, B). 13/29 If an additional information is available that the event B occurred then our knowledge about the probability of the event A changes to P (A B) = P (A, B) P (B), which is the conditional probability of the event A under the condition B. The conditional probability is defined only for P (B) 0. Product rule: P (A, B) = P (A B) P (B) = P (B A) P (A). From the symmetry of the joint probability and the product rule, the Bayes theorem can be derived (to come in a more general formulation for more than two events).

Properties of the conditional probability P (true B) = 1, P (false B) = 0. If A = A n and events A 1, A 2,... are mutually exclusive then n N P (A B) = n N P (A n B). 14/29 Events A, B are independent, if and only if P (A B) = P (A). If B A then P (A B) = 1. If B A then P (A B) = 0. Events Bi, i I, constitute a complete system of events if they are mutually exclusive and B i = true. i I A complete system of events has such property that one and only one event of them occurs.

Example: conditional probability 15/29 Consider rolling a single dice. What is the probability that the number higher than three comes up (event A) under the conditions that the odd number came up (event B)? Ω = {1, 2, 3, 4, 5, 6}, A = {4, 5, 6}, B = {1, 3, 5} P (A) = P (B) = 1 2 P (A, B) = P ({5}) = 1 6 P (A B) = P (A, B) P (B) = 1 6 1 2 = 1 3

The total probability theorem Let B i, i I, be a complete system of events and let it hold i I : P (B i ) 0. 16/29 Then for every event A holds P (A) = i I P (B i ) P (A B i ). Proof: (( ) ( ) P (A) = P B i ) A = P (B i A) i I i I = i I P (B i A) = i I P (B i ) P (A B i ).

Bayes theorem Let B i, i I, be a complete system of events and i I : P (B i ) 0. 17/29 For each event A fulfilling the condition P (A) 0 the following holds P (B i A) = P (B i ) P (A B i ) i I P (B i) P (A B i ), where P (B i A) is the posterior probability; P (B i ) is the prior probability; and P (A B i ) are known conditional probabilities of A having observed B i. Proof (exploring the total probability theorem): P (B i A) = P (B i A) P (A) = P (B i ) P (A B i ) i I P (B i) P (A B i ).

The importance of the Bayes theorem Bayes theorem is a fundamental rule for machine learning (pattern recognition) as it allows to compute the probability of an output A given observations B i. The conditional probabilities (also likelihoods) P (A Bi) are estimated from experiments or from a statistical model. Having P (A Bi), the aposteriori probabilities P (Bi A) are determined, which serve to estimate optimally which event from B i occurred. It is needed to know the apriori probability P (Bi) to determine aposteriori probability P (B i A). Informally: posterior prior conditional probability of the event having some observations. 18/29 In a similar manner, we define the conditional probability distribution, conditional density of the continuous random variable, etc.

Conditional independence 19/29 Random events A, B are conditionally independent under the condition C, if P (A B C) = P (A C) P (B C). Similarly, a conditional independence of more events, random variables, etc. is defined.

Independent events Event A, B are independent P (A B) = P (A) P (B). 20/29 Example The dice is rolled once. Events are: A > 3, event B is an odd number. Are these events independent? Ω = {1, 2, 3, 4, 5, 6}, A = {4, 5, 6}, B = {1, 3, 5} P (A) = P (B) = 1 2 P (A B) = P ({5}) = 1 6 P (A) P (B) = 1 2 1 2 = 1 4 P (A B) P (A) P (B) The events are dependent.

The random variable The random variable is an arbitrary function X : Ω R, where Ω is a sample space. 21/29 Why is the concept of the random variable introduced? It allows to work with concepts as the distribution function, probability density, expectation (mean value), etc. There are two basic types of random variables: Discrete a countable number of values. Examples: rolling a dice, the count of number of cars passing through a street in a hour. The discrete probability is given as P (X = a i ) = p(a i ), i = 1,..., i p(a i) = 1. Continuous values from some interval, i.e. infinite number of values. Example: the height persons. The continuous probability is given by the distribution function or the probability density function.

Distribution function of a random variable Distribution function of the random variable X is a function F : X [0, 1] defined as F (x) = P (X x), where P is a probability. 22/29 Properties: 1. F (x) is a non-decreasing function, i.e. pair x 1 < x 2 it holds F (x 1 ) F (x 2 ). 2. F (X) is continuous from the right, i.e. it holds lim F (x + h) = F (x). h 0 + 3. It holds for every distribution function lim F (x) = 0 a x F (x) = 1. Written more concisely: F ( ) = 0, F ( ) = 1. lim x If the possible values of F (x) are from the interval (a, b) then F (a) = 0, F (b) = 1. Any function fulfilling the above three properties can be understood as a distribution function.

Continuous distribution function The distribution function F is called (absolutely) continuous if a nonnegative function f (probability density) exists and it holds 23/29 F (x) = x f(u) du for every x X. The probability density fulfills f(x) dx = 1. f(x) Area = 1 x If the derivative of F (x) exists in the point x then F (x) = f(x). For a, b R, a < b, it holds P (a < X < b) = b a f(x) dx = F (b) F (a).

Example, normal distribution 24/29 F (x) f(x) = 1 2πσ 2 e x2 2σ 2 Distribution function Probability density

The law of large numbers 25/29 The law of large numbers says that if very many independent experiments can be made then it is almost certain that the relative frequency will converge to the theoretical value of the probability density. Jakob Bernoulli, Ars Conjectandi: Usum & Applicationem Praecedentis Doctrinae in Civilibus, Moralibus & Oeconomicis, 1713, Chapter 4.

Expectation 26/29 (Mathematical) expectation = the average of a variable under the probability distribution. Continuous definition: E(x) = x f(x) dx. Discrete definition: E(x) = x P (x). x The expectation can be estimated from a number of samples by E(x) 1 N x i. The approximation becomes exact for N. i Expectation over multiple variables: Ex(x, y) = Conditional expectation: E(x y) = x f(x y) dx. (x, y) f(x) dx

Basic characteristics of a random variable 27/29 Continuous distribution Discrete distribution Expectation E(x) = x f(x) dx E(x) = x k-th (general) moment E(x k ) = x k f(x) dx E(x) = x k-th central moment E(x k ) = E(x)) (x k f(x) dx E(x) = x Dispersion D(x) = E(x)) (x 2 f(x) dx E(x) = x x P (x) x k P (x) (x E(x)) k P (x) (x E(x)) 2 P (x)

Covariance 28/29 The mutual covariance σ xy of two random variables X, Y is σ xy = E ((X µ x )(Y µ y )). The covariance matrix Σ of n variables X 1,..., X n is Σ = σ 2 1... σ 1n.... σ n1... σ 2 n The covariance matrix is symmetric (i.e. Σ = Σ ) and positive-semidefinite (as the covariance matrix is real valued, the positive-semidefinite means that xmx 0 for all x R).

Quantiles, median 29/29 The p-quantile Qp: P (X < Qp) = p. The median is the p-quantile for p = 12, i.e. P (X < Qp) = 12. Note: Median is often used as a replacement for the mean value in robust statistics.