# P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Save this PDF as:

Size: px
Start display at page:

Download "P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )"

## Transcription

1 Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X = v i ] and satisfies the following conditions: P (x) 0 x X P (x) = 1 Discrete random variables Expected value The expected value, mean or average of a random variable x is: E[x] = µ = x X xp (x) = m v i P (v i ) The expectation operator is linear: E[λx + λ y] = λe[x] + λ E[y] Variance The variance of a random variable is the moment of inertia of its probability mass function: Var[x] = σ 2 = E[(x µ) 2 ] = x X(x µ) 2 P (x) The standard deviation σ indicates the typical amount of deviation from the mean one should expect for a randomly drawn value for x. Properties of mean and variance second moment E[x 2 ] = x X x 2 P (x) variance in terms of expectation variance and scalar multiplication variance of uncorrelated variables Var[x] = E[x 2 ] E[x] 2 Var[λx] = λ 2 Var[x] Var[x + y] = Var[x] + Var[y]

2 Probability distributions Bernoulli distribution Two possible values (outcomes): 1 (success), 0 (failure). Parameters: p probability of success. Probability mass function: P (x; p) = { p if x = 1 1 p if x = 0 E[x] = p Var[x] = p(1 p) Example: tossing a coin Head (success) and tail (failure) possible outcomes p is probability of head Bernoulli distribution Proof of mean E[x] = xp (x) x X = xp (x) x {0,1} = 0 (1 p) + 1 p = p Bernoulli distribution Proof of variance Var[x] = x X(x µ) 2 P (x) = (x p) 2 P (x) x {0,1} = (0 p) 2 (1 p) + (1 p) 2 p = p 2 (1 p) + (1 p) (1 p) p = (1 p) (p 2 + p p 2 ) = (1 p) p 2

3 Probability distributions Binomial distribution Probability of a certain number of successes in n independent Bernoulli trials Parameters: p probability of success, n number of trials. Probability mass function: E[x] = np Var[x] = np(1 p) P (x; p, n) = ( ) n p x (1 p) n x x Example: tossing a coin n number of coin tosses probability of obtaining x heads Pairs of discrete random variables Probability mass function Given a pair of discrete random variables X and Y taking values X = {v 1,..., v m } Y = {w 1,..., w n }, the joint probability mass function is defined as: with properties: P (x, y) 0 x X y Y P (x, y) = 1 P (v i, w j ) = Pr[X = v i, Y = w j ] Properties Expected value Variance Covariance Correlation coefficient µ x = E[x] = xp (x, y) x X y Y µ y = E[y] = yp (x, y) x X y Y σx 2 = Var[(x µ x ) 2 ] = (x µ x ) 2 P (x, y) x X y Y σy 2 = Var[(y µ y ) 2 ] = (y µ y ) 2 P (x, y) x X y Y σ xy = E[(x µ x )(y µ y )] = x X (x µ x )(y µ y )P (x, y) y Y ρ = σ xy σ x σ y 3

4 Probability distributions Multinomial distribution (one sample) Models the probability of a certain outcome for an event with m possible outcomes. Parameters: p 1,..., p m probability of each outcome Probability mass function: P (x 1,..., x m ; p 1,..., p m ) = m p xi i where x 1,..., x m is a vector with x i = 1 for outcome i and x j = 0 for all j i. E[x i ] = p i Var[x i ] = p i (1 p i ) Cov[x i, x j ] = p i p j Probability distributions Multinomial distribution: example Tossing a dice with six faces: m is the number of faces p i is probability of obtaining face i Probability distributions Multinomial distribution (general case) Given n samples of an event with m possible outcomes, models the probability of a certain distribution of outcomes. Parameters: p 1,..., p m probability of each outcome, n number of samples. Probability mass function (assumes m x i = n): P (x 1,..., x m ; p 1,..., p m, n) = n! m x i! m p xi i E[x i ] = np i Var[x i ] = np i (1 p i ) Cov[x i, x j ] = np i p j 4

5 Probability distributions Multinomial distribution: example Tossing a dice n number of times a dice is tossed x i number of times face i is obtained p i probability of obtaining face i Conditional probabilities conditional probability probability of x once y is observed P (x y) = P (x, y) P (y) statistical independence variables X and Y are statistical independent iff implying: P (x, y) = P (x)p (y) P (x y) = P (x) P (y x) = P (y) Basic rules law of total probability The marginal distribution of a variable is obtained from a joint distribution summing over all possible values of the other variable (sum rule) P (x) = y Y P (x, y) P (y) = x X P (x, y) product rule conditional probability definition implies that P (x, y) = P (x y)p (y) = P (y x)p (x) Bayes rule P (y x) = P (x y)p (y) P (x) Bayes rule Significance P (y x) = P (x y)p (y) P (x) allows to invert statistical connections between effect (x) and cause (y): posterior = likelihood prior evidence evidence can be obtained using the sum rule from likelihood and prior: P (x) = y P (x, y) = y P (x y)p (y) 5

6 Playing with probabilities Use rules! Basic rules allow to model a certain probability (e.g. cause given effect) given knowledge of some related ones (e.g. likelihood, prior) All our manipulations will be applications of the three basic rules Basic rules apply to any number of varables: P (y) = x = x = x P (x, y, z) z (sum rule) P (y x, z)p (x, z) z z P (x y, z)p (y z)p (x, z) P (x z) (product rule) (Bayes rule) Playing with probabilities Example P (y x, z) = = = = = = P (x, z y)p (y) P (x, z) (Bayes rule) P (x, z y)p (y) P (x z)p (z) (product rule) P (x z, y)p (z y)p (y) P (x z)p (z) (product rule) P (x z, y)p (z, y) P (x z)p (z) (product rule) P (x z, y)p (y z)p (z) P (x z)p (z) (product rule) P (x z, y)p (y z) P (x z) Continuouos random variables Probability density function Instead of the probability of a specific value of X, we model the probability that x falls in an interval (a, b): Properties: p(x) 0 p(x)dx = 1 Pr[x (a, b)] = Note The probability of a specific value x 0 is given by: b a p(x)dx 1 p(x 0 ) = lim ɛ 0 ɛ Pr[x [x 0, x 0 + ɛ)] 6

7 Properties expected value variance E[x] = µ = Var[x] = σ 2 = xp(x)dx (x µ) 2 p(x)dx Note Definitions and formulas for discrete random variables carry over to continuous random variables with sums replaced by integrals Probability distributions Gaussian (or normal) distribution Bell-shaped curve. Parameters: µ mean, σ 2 variance. Probability density function: p(x; µ, σ) = 1 (x µ)2 exp 2πσ 2σ 2 E[x] = µ Var[x] = σ 2 7

8 Standard normal distribution: N(0, 1) Standardization of a normal distribution N(µ, σ 2 ) z = x µ σ Probability distributions Beta distribution Defined in the interval [0, 1] Parameters: α, β Probability density function: p(x; α, β) = Γ(α + β) Γ(α)Γ(β) xα 1 (1 x) β 1 E[x] = Var[x] = α α+β Γ(x + 1) = xγ(x), Γ(1) = 1 αβ (α+β) 2 (α+β+1) Note It models the posterior distribution of parameter p of a binomial distribution after observing α 1 independent events with probability p and β 1 with probability 1 p. 8

9 Probability distributions Multivariate normal distribution normal distribution for d-dimensional vectorial data. Parameters: µ mean vector, Σ covariance matrix. Probability density function: E[x] = µ Var[x] = Σ p(x; µ, Σ) = 1 (2π) d/2 Σ 1/2 exp 1 2 (x µ)t Σ 1 (x µ) squared Mahalanobis distance from x to µ is standard measure of distance to mean: r 2 = (x µ) T Σ 1 (x µ) Probability distributions Dirichlet distribution Defined: x [0, 1] m, m x i = 1 Parameters: α = α 1,..., α m Probability density function: p(x 1,..., x m ; α) = Γ(α 0 ) m Γ(α i) m x αi 1 i 9

10 E[x i ] = αi α 0 where α 0 = m j=1 α j Var[x i ] = αi(α0 αi) α 2 0 (α0+1) Cov[x i, x j ] = αiαj α 2 0 (α0+1) Note It models the posterior distribution of parameters p of a multinomial distribution after observing α i 1 times each mutually exclusive event Probability laws Expectation of an average Consider a sample of X 1,..., X n i.i.d instances drawn from a distribution with mean µ and variance σ 2. Consider the random variable X n measuring the sample average: X n = X X n n 10

11 Its expectation is computed as (E[a(X + Y )] = a(e[x] + E[Y ])): E[ X n ] = 1 n (E[X 1] + + E[X n ]) = µ i.e. the expectation of an average is the true mean of the distribution Probability laws variance of an average Consider the random variable X n measuring the sample average: X n = X X n n Its variance is computed as (Var[a(X + Y )] = a 2 (Var[X] + Var[Y ]) for X and Y independent): Var[ X n ] = 1 n 2 (Var[X 1] + + Var[X n ]) = σ2 n i.e. the variance of the average decreases with the number of observations (the more examples you see, the more likely you are to estimate the correct average) Probability laws Chebyshev s inequality Consider a random variable X with mean µ and variance σ 2. Chebyshev s inequality states that for all a > 0: Replacing a = kσ for k > 0 we obtain: Pr[ X µ a] σ2 a 2 Pr[ X µ kσ] 1 k 2 Note Chebyshev s inequality shows that most of the probability mass of a random variable stays within few standard deviations from its mean Probability laws The law of large numbers Consider a sample of X 1,..., X n i.i.d instances drawn from a distribution with mean µ and variance σ 2. For any ɛ > 0, its sample average X n obeys: lim Pr[ X n µ > ɛ] = 0 n It can be shown using Chebyshev s inequality and the facts that E[ X n ] = µ, Var[ X n ] = σ 2 /n: Interpretation Pr[ X n E[ X n ] ɛ] σ2 nɛ 2 The accuracy of an empirical statistic increases with the number of samples 11

12 Probability laws Central Limit theorem Consider a sample of X 1,..., X n i.i.d instances drawn from a distribution with mean µ and variance σ Regardless of the distribution of X i, for n, the distribution of the sample average X n approaches a Normal distribution 2. Its mean approaches µ and its variance approaches σ 2 /n 3. Thus the normalized sample average: z = X n µ σ n approaches a standard Normal distribution N(0, 1). Central Limit theorem Interpretation The sum of a sufficiently large sample of i.i.d. random measurements is approximately normally distributed We don t need to know the form of their distribution (it can be arbitrary) Justifies the importance of Normal distribution in real world applications Information theory Entropy Consider a discrete set of symbols V = {v 1,..., v n } with mutually exclusive probabilities P (v i ). We aim a designing a binary code for each symbol, minimizing the average length of messages Shannon and Weaver (1949) proved that the optimal code assigns to each symbol v i a number of bits equal to log P (v i ) The entropy of the set of symbols is the expected length of a message encoding a symbol assuming such optimal coding: n H[V] = E[ log P (v)] = P (v i ) log P (v i ) Information theory Cross entropy Consider two distributions P and Q over variable X The cross entropy between P and Q measures the expected number of bits needed to code a symbol sampled from P using Q instead H(P ; Q) = E P [ log Q(v)] = n P (v i ) log Q(v i ) 12

13 Information theory Relative entropy Consider two distributions P and Q over variable X The relative entropy or Kullback-Leibler (KL) divergence measures the expected length difference when coding instances sampled from P using Q instead: D KL (p q) = H(P ; Q) H(P ) n n = P (v i ) log Q(v i ) + P (v i ) log P (v i ) = n P (v i ) log P (v i) Q(v i ) Note The KL-divergence is not a distance (metric) as it is not necessarily symmetric Information theory Conditional entropy Consider two variables V, W with (possibly different) distributions P The conditional entropy is the entropy remaining for variable W once V is known: H(W V ) = v P (v)h(w V = v) = v P (v) w P (w v) log P (w v) Information theory Mutual information Consider two variables V, W with (possibly different) distributions P The mutual information (or information gain) is the reduction in entropy for W once V is known: I(W ; V ) = H(W ) H(W V ) = w p(w) log p(w) + v P (v) w P (w v) log P (w v) 13

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### Topic 4: Multivariate random variables. Multiple random variables

Topic 4: Multivariate random variables Joint, marginal, and conditional pmf Joint, marginal, and conditional pdf and cdf Independence Expectation, covariance, correlation Conditional expectation Two jointly

### For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (1903-1987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll

### Topic 8 The Expected Value

Topic 8 The Expected Value Functions of Random Variables 1 / 12 Outline Names for Eg(X ) Variance and Standard Deviation Independence Covariance and Correlation 2 / 12 Names for Eg(X ) If g(x) = x, then

### Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

### Chapters 5. Multivariate Probability Distributions

Chapters 5. Multivariate Probability Distributions Random vectors are collection of random variables defined on the same sample space. Whenever a collection of random variables are mentioned, they are

### Math 141. Lecture 7: Variance, Covariance, and Sums. Albyn Jones 1. 1 Library 304. jones/courses/141

Math 141 Lecture 7: Variance, Covariance, and Sums Albyn Jones 1 1 Library 304 jones@reed.edu www.people.reed.edu/ jones/courses/141 Last Time Variance: expected squared deviation from the mean: Standard

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### ECE302 Spring 2006 HW7 Solutions March 11, 2006 1

ECE32 Spring 26 HW7 Solutions March, 26 Solutions to HW7 Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in italics where

### MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

### 4. Joint Distributions of Two Random Variables

4. Joint Distributions of Two Random Variables 4.1 Joint Distributions of Two Discrete Random Variables Suppose the discrete random variables X and Y have supports S X and S Y, respectively. The joint

### SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,

### Random Vectors and the Variance Covariance Matrix

Random Vectors and the Variance Covariance Matrix Definition 1. A random vector X is a vector (X 1, X 2,..., X p ) of jointly distributed random variables. As is customary in linear algebra, we will write

### Probability Theory. Elementary rules of probability Sum rule. Product rule. p. 23

Probability Theory Uncertainty is key concept in machine learning. Probability provides consistent framework for the quantification and manipulation of uncertainty. Probability of an event is the fraction

### Random Variables, Expectation, Distributions

Random Variables, Expectation, Distributions CS 5960/6960: Nonparametric Methods Tom Fletcher January 21, 2009 Review Random Variables Definition A random variable is a function defined on a probability

### 1 Prior Probability and Posterior Probability

Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

### Chapter 2, part 2. Petter Mostad

Chapter 2, part 2 Petter Mostad mostad@chalmers.se Parametrical families of probability distributions How can we solve the problem of learning about the population distribution from the sample? Usual procedure:

### Math 431 An Introduction to Probability. Final Exam Solutions

Math 43 An Introduction to Probability Final Eam Solutions. A continuous random variable X has cdf a for 0, F () = for 0 <

### Expectation Discrete RV - weighted average Continuous RV - use integral to take the weighted average

PHP 2510 Expectation, variance, covariance, correlation Expectation Discrete RV - weighted average Continuous RV - use integral to take the weighted average Variance Variance is the average of (X µ) 2

### Joint Exam 1/P Sample Exam 1

Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

### Lecture Notes 1. Brief Review of Basic Probability

Probability Review Lecture Notes Brief Review of Basic Probability I assume you know basic probability. Chapters -3 are a review. I will assume you have read and understood Chapters -3. Here is a very

### SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline

### MT426 Notebook 3 Fall 2012 prepared by Professor Jenny Baglivo. 3 MT426 Notebook 3 3. 3.1 Definitions... 3. 3.2 Joint Discrete Distributions...

MT426 Notebook 3 Fall 2012 prepared by Professor Jenny Baglivo c Copyright 2004-2012 by Jenny A. Baglivo. All Rights Reserved. Contents 3 MT426 Notebook 3 3 3.1 Definitions............................................

### Exercises with solutions (1)

Exercises with solutions (). Investigate the relationship between independence and correlation. (a) Two random variables X and Y are said to be correlated if and only if their covariance C XY is not equal

### SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance

### Chapter 5. Random variables

Random variables random variable numerical variable whose value is the outcome of some probabilistic experiment; we use uppercase letters, like X, to denote such a variable and lowercase letters, like

### MATH 201. Final ANSWERS August 12, 2016

MATH 01 Final ANSWERS August 1, 016 Part A 1. 17 points) A bag contains three different types of dice: four 6-sided dice, five 8-sided dice, and six 0-sided dice. A die is drawn from the bag and then rolled.

### Chapter 4. Multivariate Distributions

1 Chapter 4. Multivariate Distributions Joint p.m.f. (p.d.f.) Independent Random Variables Covariance and Correlation Coefficient Expectation and Covariance Matrix Multivariate (Normal) Distributions Matlab

### Probability Calculator

Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that

### MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

### Lecture 8. Confidence intervals and the central limit theorem

Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of

### University of California, Los Angeles Department of Statistics. Random variables

University of California, Los Angeles Department of Statistics Statistics Instructor: Nicolas Christou Random variables Discrete random variables. Continuous random variables. Discrete random variables.

### Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

### 3 Multiple Discrete Random Variables

3 Multiple Discrete Random Variables 3.1 Joint densities Suppose we have a probability space (Ω, F,P) and now we have two discrete random variables X and Y on it. They have probability mass functions f

### Statistics 100A Homework 8 Solutions

Part : Chapter 7 Statistics A Homework 8 Solutions Ryan Rosario. A player throws a fair die and simultaneously flips a fair coin. If the coin lands heads, then she wins twice, and if tails, the one-half

### Statistics 100A Homework 4 Solutions

Problem 1 For a discrete random variable X, Statistics 100A Homework 4 Solutions Ryan Rosario Note that all of the problems below as you to prove the statement. We are proving the properties of epectation

### 4.1 4.2 Probability Distribution for Discrete Random Variables

4.1 4.2 Probability Distribution for Discrete Random Variables Key concepts: discrete random variable, probability distribution, expected value, variance, and standard deviation of a discrete random variable.

### ST 371 (VIII): Theory of Joint Distributions

ST 371 (VIII): Theory of Joint Distributions So far we have focused on probability distributions for single random variables. However, we are often interested in probability statements concerning two or

### E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Worked examples Multiple Random Variables

Worked eamples Multiple Random Variables Eample Let X and Y be random variables that take on values from the set,, } (a) Find a joint probability mass assignment for which X and Y are independent, and

### Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 1 Joint Probability Distributions 1 1.1 Two Discrete

### Lecture 16: Expected value, variance, independence and Chebyshev inequality

Lecture 16: Expected value, variance, independence and Chebyshev inequality Expected value, variance, and Chebyshev inequality. If X is a random variable recall that the expected value of X, E[X] is the

### Topic 8: The Expected Value

Topic 8: September 27 and 29, 2 Among the simplest summary of quantitative data is the sample mean. Given a random variable, the corresponding concept is given a variety of names, the distributional mean,

### Maximum Likelihood Estimation

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

### Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Monte Carlo Simulation: IEOR E4703 Fall 2004 c 2004 by Martin Haugh Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab 1 Overview of Monte Carlo Simulation 1.1 Why use simulation?

### Examination 110 Probability and Statistics Examination

Examination 0 Probability and Statistics Examination Sample Examination Questions The Probability and Statistics Examination consists of 5 multiple-choice test questions. The test is a three-hour examination

### THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

### 5. Continuous Random Variables

5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be

### Some probability and statistics

Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y,

### Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

### An Introduction to Basic Statistics and Probability

An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random

### Notes 11 Autumn 2005

MAS 08 Probabilit I Notes Autumn 005 Two discrete random variables If X and Y are discrete random variables defined on the same sample space, then events such as X = and Y = are well defined. The joint

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

### Gaussian Conjugate Prior Cheat Sheet

Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian

### Notes on the Negative Binomial Distribution

Notes on the Negative Binomial Distribution John D. Cook October 28, 2009 Abstract These notes give several properties of the negative binomial distribution. 1. Parameterizations 2. The connection between

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability

CS 7 Discrete Mathematics and Probability Theory Fall 29 Satish Rao, David Tse Note 8 A Brief Introduction to Continuous Probability Up to now we have focused exclusively on discrete probability spaces

### Section 5.1 Continuous Random Variables: Introduction

Section 5. Continuous Random Variables: Introduction Not all random variables are discrete. For example:. Waiting times for anything (train, arrival of customer, production of mrna molecule from gene,

### Master s Theory Exam Spring 2006

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

### Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

### 1 Sufficient statistics

1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

### Covariance and Correlation

Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### Definition 6.1.1. A r.v. X has a normal distribution with mean µ and variance σ 2, where µ R, and σ > 0, if its density is f(x) = 1. 2σ 2.

Chapter 6 Brownian Motion 6. Normal Distribution Definition 6... A r.v. X has a normal distribution with mean µ and variance σ, where µ R, and σ > 0, if its density is fx = πσ e x µ σ. The previous definition

### Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8

### Aggregate Loss Models

Aggregate Loss Models Chapter 9 Stat 477 - Loss Models Chapter 9 (Stat 477) Aggregate Loss Models Brian Hartman - BYU 1 / 22 Objectives Objectives Individual risk model Collective risk model Computing

### Sections 2.11 and 5.8

Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

### Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0

Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall

APPLIED MATHEMATICS ADVANCED LEVEL INTRODUCTION This syllabus serves to examine candidates knowledge and skills in introductory mathematical and statistical methods, and their applications. For applications

### Joint Distribution and Correlation

Joint Distribution and Correlation Michael Ash Lecture 3 Reminder: Start working on the Problem Set Mean and Variance of Linear Functions of an R.V. Linear Function of an R.V. Y = a + bx What are the properties

### Statistical Machine Learning from Data

Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

### A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

### Statistics 100A Homework 7 Solutions

Chapter 6 Statistics A Homework 7 Solutions Ryan Rosario. A television store owner figures that 45 percent of the customers entering his store will purchase an ordinary television set, 5 percent will purchase

### Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

### 3. Continuous Random Variables

3. Continuous Random Variables A continuous random variable is one which can take any value in an interval (or union of intervals) The values that can be taken by such a variable cannot be listed. Such

### Practice problems for Homework 11 - Point Estimation

Practice problems for Homework 11 - Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:

### What you CANNOT ignore about Probs and Stats

What you CANNOT ignore about Probs and Stats by Your Teacher Version 1.0.3 November 5, 2009 Introduction The Finance master is conceived as a postgraduate course and contains a sizable quantitative section.

### INSURANCE RISK THEORY (Problems)

INSURANCE RISK THEORY (Problems) 1 Counting random variables 1. (Lack of memory property) Let X be a geometric distributed random variable with parameter p (, 1), (X Ge (p)). Show that for all n, m =,

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### The Dirichlet-Multinomial and Dirichlet-Categorical models for Bayesian inference

The Dirichlet-Multinomial and Dirichlet-Categorical models for Bayesian inference Stephen Tu tu.stephenl@gmail.com 1 Introduction This document collects in one place various results for both the Dirichlet-multinomial

### Joint Distributions. Tieming Ji. Fall 2012

Joint Distributions Tieming Ji Fall 2012 1 / 33 X : univariate random variable. (X, Y ): bivariate random variable. In this chapter, we are going to study the distributions of bivariate random variables

### ST 371 (IV): Discrete Random Variables

ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible

### RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. DISCRETE RANDOM VARIABLES.. Definition of a Discrete Random Variable. A random variable X is said to be discrete if it can assume only a finite or countable

### Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

### Lecture 8: More Continuous Random Variables

Lecture 8: More Continuous Random Variables 26 September 2005 Last time: the eponential. Going from saying the density e λ, to f() λe λ, to the CDF F () e λ. Pictures of the pdf and CDF. Today: the Gaussian

### STAT 830 Convergence in Distribution

STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Probability Generating Functions

page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence

### Joint Probability Distributions and Random Samples. Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

5 Joint Probability Distributions and Random Samples Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Two Discrete Random Variables The probability mass function (pmf) of a single

### Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

### 3. The Multivariate Normal Distribution

3. The Multivariate Normal Distribution 3.1 Introduction A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis While real data

### Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

HW MATH 461/561 Lecture Notes 15 1 Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density and marginal densities f(x, y), (x, y) Λ X,Y f X (x), x Λ X,

### Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Random variables Remark on Notations 1. When X is a number chosen uniformly from a data set, What I call P(X = k) is called Freq[k, X] in the courseware. 2. When X is a random variable, what I call F ()

### The Binomial Probability Distribution

The Binomial Probability Distribution MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2015 Objectives After this lesson we will be able to: determine whether a probability

### ( ) = P Z > = P( Z > 1) = 1 Φ(1) = 1 0.8413 = 0.1587 P X > 17

4.6 I company that manufactures and bottles of apple juice uses a machine that automatically fills 6 ounce bottles. There is some variation, however, in the amounts of liquid dispensed into the bottles

### Martingale Ideas in Elementary Probability. Lecture course Higher Mathematics College, Independent University of Moscow Spring 1996

Martingale Ideas in Elementary Probability Lecture course Higher Mathematics College, Independent University of Moscow Spring 1996 William Faris University of Arizona Fulbright Lecturer, IUM, 1995 1996

### Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant