Probability Review. Rob Hall. September 9, 2010

Size: px
Start display at page:

Download "Probability Review. Rob Hall. September 9, 2010"

Transcription

1 Probability Review Rob Hall September 9, 2010

2 What is Probability? Probability reasons about a sample, knowing the population. The goal of statistics is to estimate the population based on a sample. Both provide invaluable tools to modern machine learning.

3 Plan Facts about sets (to get our brains in gear). Definitions and facts about probabilities. Random variables and joint distributions. Characteristics of distributions (mean, variance, entropy). Some asymptotic results (a high level perspective). Goals: get some intuition about probability, learn how to formulate a simple proof, lay out some useful identities for use as a reference. Non-goal: supplant an entire semester long course in probability.

4 Set Basics A set is just a collection of elements denoted e.g., S = {s 1, s 2, s 3}, R = {r : some condition holds on r}. Intersection: the elements that are in both sets: A B = {x : x A and x B} Union: the elements that are in either set, or both: A B = {x : x A or x B} Complementation: all the elements that aren t in the set: A C = {x : x A}. A C A A B B A B

5 Properties of Set Operations Commutativity: A B = B A Associativity: A (B C) = (A B) C. Likewise for intersection. Proof?

6 Properties of Set Operations Commutativity: A B = B A Associativity: A (B C) = (A B) C. Likewise for intersection. Proof? Follows easily from commutative and associative properties of and and or in the definitions.

7 Properties of Set Operations Commutativity: A B = B A Associativity: A (B C) = (A B) C. Likewise for intersection. Proof? Follows easily from commutative and associative properties of and and or in the definitions. Distributive properties: A (B C) = (A B) (A C) A (B C) = (A B) (A C) Proof?

8 Properties of Set Operations Commutativity: A B = B A Associativity: A (B C) = (A B) C. Likewise for intersection. Proof? Follows easily from commutative and associative properties of and and or in the definitions. Distributive properties: A (B C) = (A B) (A C) A (B C) = (A B) (A C) Proof? Show each side of the equality contains the other. DeMorgan s Law...see book.

9 Disjointness and Partitions A sequence of sets A 1, A 2... is called pairwise disjoint or mutually exclusive if for all i j, A i A j = {}. If the sequence is pairwise disjoint and i=1 A i = S, then the sequence forms a partition of S. Partitions are useful in probability theory and in life: B S = B ( A i ) = i=1 (def of partition) (B A i ) (distributive property) i=1 Note that the sets B A i are also pairwise disjoint (proof?).

10 Disjointness and Partitions A sequence of sets A 1, A 2... is called pairwise disjoint or mutually exclusive if for all i j, A i A j = {}. If the sequence is pairwise disjoint and i=1 A i = S, then the sequence forms a partition of S. Partitions are useful in probability theory and in life: B S = B ( A i ) = i=1 (def of partition) (B A i ) (distributive property) i=1 Note that the sets B A i are also pairwise disjoint (proof?). If S is the whole space, what have we constructed?.

11 Probability Terminology Name What it is Common What it means Symbols Sample Space Set Ω, S Possible outcomes. Event Space Collection of subsets F, E The things that have probabilities.. Probability Measure Measure P, π Assigns probabilities to events. Probability Space A triple (Ω, F, P) Remarks: may consider the event space to be the power set of the sample space (for a discrete sample space - more later).

12 Probability Terminology Name What it is Common What it means Symbols Sample Space Set Ω, S Possible outcomes. Event Space Collection of subsets F, E The things that have probabilities.. Probability Measure Measure P, π Assigns probabilities to events. Probability Space A triple (Ω, F, P) Remarks: may consider the event space to be the power set of the sample space (for a discrete sample space - more later). e.g., rolling a fair die: Ω = {1, 2, 3, 4, 5, 6}

13 Probability Terminology Name What it is Common What it means Symbols Sample Space Set Ω, S Possible outcomes. Event Space Collection of subsets F, E The things that have probabilities.. Probability Measure Measure P, π Assigns probabilities to events. Probability Space A triple (Ω, F, P) Remarks: may consider the event space to be the power set of the sample space (for a discrete sample space - more later). e.g., rolling a fair die: Ω = {1, 2, 3, 4, 5, 6} F = 2 Ω = {{1}, {2}... {1, 2}... {1, 2, 3}... {1, 2, 3, 4, 5, 6}, {}}

14 Probability Terminology Name What it is Common What it means Symbols Sample Space Set Ω, S Possible outcomes. Event Space Collection of subsets F, E The things that have probabilities.. Probability Measure Measure P, π Assigns probabilities to events. Probability Space A triple (Ω, F, P) Remarks: may consider the event space to be the power set of the sample space (for a discrete sample space - more later). e.g., rolling a fair die: Ω = {1, 2, 3, 4, 5, 6} F = 2 Ω = {{1}, {2}... {1, 2}... {1, 2, 3}... {1, 2, 3, 4, 5, 6}, {}} P({1}) = P({2}) =... = 1 (i.e., a fair die) 6 P({1, 3, 5}) = 1 (i.e., half chance of odd result) 2 P({1, 2, 3, 4, 5, 6}) = 1 (i.e., result is almost surely one of the faces).

15 Axioms for Probability A set of conditions imposed on probability measures (due to Kolmogorov) P(A) 0, A F P(Ω) = 1 P( i=1 A i) = i=1 P(A i) where {A i } i=1 disjoint. F are pairwise

16 Axioms for Probability A set of conditions imposed on probability measures (due to Kolmogorov) P(A) 0, A F P(Ω) = 1 P( i=1 A i) = i=1 P(A i) where {A i } i=1 disjoint. These quickly lead to: F are pairwise P(A C ) = 1 P(A) (since P(A) + P(A C ) = P(A A C ) = P(Ω) = 1). P(A) 1 (since P(A C ) 0). P({}) = 0 (since P(Ω) = 1).

17 P(A B) General Unions Recall that A, A C form a partition of Ω: B = B Ω = B (A A C ) = (B A) (B A C ) A A B B

18 P(A B) General Unions Recall that A, A C form a partition of Ω: B = B Ω = B (A A C ) = (B A) (B A C ) And so: P(B) = P(B A) + P(B A C ) For a general partition this is called the law of total A A B B probability.

19 P(A B) General Unions Recall that A, A C form a partition of Ω: B = B Ω = B (A A C ) = (B A) (B A C ) And so: P(B) = P(B A) + P(B A C ) For a general partition this is called the law of total A A B B probability. P(A B) = P(A (B A C ))

20 P(A B) General Unions Recall that A, A C form a partition of Ω: B = B Ω = B (A A C ) = (B A) (B A C ) And so: P(B) = P(B A) + P(B A C ) For a general partition this is called the law of total A A B B probability. P(A B) = P(A (B A C )) = P(A) + P(B A C )

21 P(A B) General Unions Recall that A, A C form a partition of Ω: B = B Ω = B (A A C ) = (B A) (B A C ) And so: P(B) = P(B A) + P(B A C ) For a general partition this is called the law of total A A B B probability. P(A B) = P(A (B A C )) = P(A) + P(B A C ) = P(A) + P(B) P(B A)

22 P(A B) General Unions Recall that A, A C form a partition of Ω: B = B Ω = B (A A C ) = (B A) (B A C ) And so: P(B) = P(B A) + P(B A C ) For a general partition this is called the law of total A A B B probability. P(A B) = P(A (B A C )) = P(A) + P(B A C ) = P(A) + P(B) P(B A) P(A) + P(B) Very important difference between disjoint and non-disjoint unions. Same idea yields the so-called union bound aka Boole s inequality

23 Conditional Probabilities For events A, B F with P(B) > 0, we may write the conditional probability of A given B: A A B B P(A B) = P(A B) P(B) Interpretation: the outcome is definitely in B, so treat B as the entire sample space and find the probability that the outcome is also in A.

24 Conditional Probabilities For events A, B F with P(B) > 0, we may write the conditional probability of A given B: A A B B P(A B) = P(A B) P(B) Interpretation: the outcome is definitely in B, so treat B as the entire sample space and find the probability that the outcome is also in A. This rapidly leads to: P(A B)P(B) = P(A B) aka the chain rule for probabilities. (why?)

25 Conditional Probabilities For events A, B F with P(B) > 0, we may write the conditional probability of A given B: A A B B P(A B) = P(A B) P(B) Interpretation: the outcome is definitely in B, so treat B as the entire sample space and find the probability that the outcome is also in A. This rapidly leads to: P(A B)P(B) = P(A B) aka the chain rule for probabilities. (why?) When A 1, A 2... are a partition of Ω: P(B) = P(B A i ) = i=1 P(B A i )P(A i ) i=1

26 Conditional Probabilities For events A, B F with P(B) > 0, we may write the conditional probability of A given B: A A B B P(A B) = P(A B) P(B) Interpretation: the outcome is definitely in B, so treat B as the entire sample space and find the probability that the outcome is also in A. This rapidly leads to: P(A B)P(B) = P(A B) aka the chain rule for probabilities. (why?) When A 1, A 2... are a partition of Ω: P(B) = P(B A i ) = i=1 P(B A i )P(A i ) i=1 This is also referred to as the law of total probability.

27 Conditional Probability Example Suppose we throw a fair die: Ω = {1, 2, 3, 4, 5, 6}, F = 2 Ω, P({i}) = 1 6, i = A = {1, 2, 3, 4} i.e., result is less than 5, B = {1, 3, 5} i.e., result is odd. P(A) =

28 Conditional Probability Example Suppose we throw a fair die: Ω = {1, 2, 3, 4, 5, 6}, F = 2 Ω, P({i}) = 1 6, i = A = {1, 2, 3, 4} i.e., result is less than 5, B = {1, 3, 5} i.e., result is odd. P(A) = 2 3 P(B) =

29 Conditional Probability Example Suppose we throw a fair die: Ω = {1, 2, 3, 4, 5, 6}, F = 2 Ω, P({i}) = 1 6, i = A = {1, 2, 3, 4} i.e., result is less than 5, B = {1, 3, 5} i.e., result is odd. P(A) = 2 3 P(B) = 1 2 P(A B) =

30 Conditional Probability Example Suppose we throw a fair die: Ω = {1, 2, 3, 4, 5, 6}, F = 2 Ω, P({i}) = 1 6, i = A = {1, 2, 3, 4} i.e., result is less than 5, B = {1, 3, 5} i.e., result is odd. P(A) = 2 3 P(B) = 1 2 P(A B) = P(A B) P(B)

31 Conditional Probability Example Suppose we throw a fair die: Ω = {1, 2, 3, 4, 5, 6}, F = 2 Ω, P({i}) = 1 6, i = A = {1, 2, 3, 4} i.e., result is less than 5, B = {1, 3, 5} i.e., result is odd. P(A) = 2 3 P(B) = 1 2 P(A B) = P(A B) P(B) = P({1, 3}) P(B)

32 Conditional Probability Example Suppose we throw a fair die: Ω = {1, 2, 3, 4, 5, 6}, F = 2 Ω, P({i}) = 1 6, i = A = {1, 2, 3, 4} i.e., result is less than 5, B = {1, 3, 5} i.e., result is odd. P(A) = 2 3 P(B) = 1 2 P(A B) = P(A B) P(B) = P({1, 3}) P(B) = 2 3

33 Conditional Probability Example Suppose we throw a fair die: Ω = {1, 2, 3, 4, 5, 6}, F = 2 Ω, P({i}) = 1 6, i = A = {1, 2, 3, 4} i.e., result is less than 5, B = {1, 3, 5} i.e., result is odd. P(A) = 2 3 P(B) = 1 2 P(A B) = P(A B) P(B) = P({1, 3}) P(B) = 2 3 P(B A) = P(A B) P(A)

34 Conditional Probability Example Suppose we throw a fair die: Ω = {1, 2, 3, 4, 5, 6}, F = 2 Ω, P({i}) = 1 6, i = A = {1, 2, 3, 4} i.e., result is less than 5, B = {1, 3, 5} i.e., result is odd. P(A) = 2 3 P(B) = 1 2 P(A B) = P(A B) P(B) = P({1, 3}) P(B) P(B A) = = 1 2 P(A B) P(A) = 2 3 Note that in general, P(A B) P(B A) however we may quantify their relationship.

35 Bayes Rule Using the chain rule we may see: P(A B)P(B) = P(A B) = P(B A)P(A) Rearranging this yields Bayes rule: Often this is written as: P(B A) = P(A B)P(B) P(A) P(B i A) = P(A B i)p(b i ) i P(A B i)p(b i ) Where B i are a partition of Ω (note the bottom is just the law of total probability).

36 Independence Two events A, B are called independent if P(A B) = P(A)P(B).

37 Independence Two events A, B are called independent if P(A B) = P(A)P(B). When P(A) > 0 this may be written P(B A) = P(B) (why?)

38 Independence Two events A, B are called independent if P(A B) = P(A)P(B). When P(A) > 0 this may be written P(B A) = P(B) (why?) e.g., rolling two dice, flipping n coins etc.

39 Independence Two events A, B are called independent if P(A B) = P(A)P(B). When P(A) > 0 this may be written P(B A) = P(B) (why?) e.g., rolling two dice, flipping n coins etc. Two events A, B are called conditionally independent given C when P(A B C) = P(A C)P(B C).

40 Independence Two events A, B are called independent if P(A B) = P(A)P(B). When P(A) > 0 this may be written P(B A) = P(B) (why?) e.g., rolling two dice, flipping n coins etc. Two events A, B are called conditionally independent given C when P(A B C) = P(A C)P(B C). When P(A) > 0 we may write P(B A, C) = P(B C)

41 Independence Two events A, B are called independent if P(A B) = P(A)P(B). When P(A) > 0 this may be written P(B A) = P(B) (why?) e.g., rolling two dice, flipping n coins etc. Two events A, B are called conditionally independent given C when P(A B C) = P(A C)P(B C). When P(A) > 0 we may write P(B A, C) = P(B C) e.g., the weather tomorrow is independent of the weather yesterday, knowing the weather today.

42 Random Variables caution: hand waving A random variable is a function X : Ω R d e.g., Roll some dice, X = sum of the numbers. Indicators of events: X (ω) = 1 A (ω). e.g., toss a coin, X = 1 if it came up heads, 0 otherwise. Note relationship between the set theoretic constructions, and binary RVs. Give a few monkeys a typewriter, X = fraction of overlap with complete works of Shakespeare. Throw a dart at a board, X R 2 are the coordinates which are hit.

43 Distributions By considering random variables, we may think of probability measures as functions on the real numbers. Then, the probability measure associated with the RV is completely characterized by its cumulative distribution function (CDF): F X (x) = P(X x). If two RVs have the same CDF we call then identically distributed. We say X F X or X f X (f X coming soon) to indicate that X has the distribution specified by F X (resp, f X ). F X(x) F X(x)

44 Discrete Distributions If X takes on only a countable number of values, then we may characterize it by a probability mass function (PMF) which describes the probability of each value: f X (x) = P(X = x).

45 Discrete Distributions If X takes on only a countable number of values, then we may characterize it by a probability mass function (PMF) which describes the probability of each value: f X (x) = P(X = x). We have: x f X (x) = 1 (why?)

46 Discrete Distributions If X takes on only a countable number of values, then we may characterize it by a probability mass function (PMF) which describes the probability of each value: f X (x) = P(X = x). We have: x f X (x) = 1 (why?) since each ω maps to one x, and P(Ω) = 1. e.g., general discrete PMF: f X (x i ) = θ i, i θ i = 1, θ i 0.

47 Discrete Distributions If X takes on only a countable number of values, then we may characterize it by a probability mass function (PMF) which describes the probability of each value: f X (x) = P(X = x). We have: x f X (x) = 1 (why?) since each ω maps to one x, and P(Ω) = 1. e.g., general discrete PMF: f X (x i ) = θ i, i θ i = 1, θ i 0. e.g., bernoulli distribution: X {0, 1}, f X (x) = θ x (1 θ) 1 x A general model of binary outcomes (coin flips etc.).

48 Discrete Distributions Rather than specifying each probability for each event, we may consider a more restrictive parametric form, which will be easier to specify and manipulate (but sometimes less general).

49 Discrete Distributions Rather than specifying each probability for each event, we may consider a more restrictive parametric form, which will be easier to specify and manipulate (but sometimes less general). e.g., multinomial distribution: X N d, d i=1 x n! i = n, f X (x) = d x 1!x 2! x d! i=1 θx i i. Sometimes used in text processing (dimensions correspond to words, n is the length of a document). What have we lost in going from a general form to a multinomial?

50 Continuous Distributions When the CDF is continuous we may consider its derivative f x(x) = d dx F X (x). This is called the probability density function (PDF).

51 Continuous Distributions When the CDF is continuous we may consider its derivative f x(x) = d F dx X (x). This is called the probability density function (PDF). The probability of an interval (a, b) is given by P(a < X < b) = b f a X (x) dx. The probability of any specific point c is zero: P(X = c) = 0 (why?).

52 Continuous Distributions When the CDF is continuous we may consider its derivative f x(x) = d F dx X (x). This is called the probability density function (PDF). The probability of an interval (a, b) is given by P(a < X < b) = b f a X (x) dx. The probability of any specific point c is zero: P(X = c) = 0 (why?). e.g., Uniform distribution: f X (x) = 1 b a 1 (a,b)(x)

53 Continuous Distributions When the CDF is continuous we may consider its derivative f x(x) = d F dx X (x). This is called the probability density function (PDF). The probability of an interval (a, b) is given by P(a < X < b) = b f a X (x) dx. The probability of any specific point c is zero: P(X = c) = 0 (why?). e.g., Uniform distribution: f X (x) = 1 b a 1 (a,b)(x) e.g., Gaussian aka normal: f X (x) = 1 2πσ exp{ (x µ)2 2σ 2 }

54 Continuous Distributions When the CDF is continuous we may consider its derivative f x(x) = d F dx X (x). This is called the probability density function (PDF). The probability of an interval (a, b) is given by P(a < X < b) = b f a X (x) dx. The probability of any specific point c is zero: P(X = c) = 0 (why?). e.g., Uniform distribution: f X (x) = 1 b a 1 (a,b)(x) e.g., Gaussian aka normal: f X (x) = 1 2πσ exp{ (x µ)2 2σ 2 } Note that both families give probabilities for every interval on the real line, yet are specified by only two numbers. dnorm (x) x

55 Multiple Random Variables We may consider multiple functions of the same sample space, e.g., X (ω) = 1 A (ω), Y (ω) = 1 B (ω): A B A B May represent the joint distribution as a table: X=0 X=1 Y= Y= We write the joint PMF or PDF as f X,Y (x, y)

56 Multiple Random Variables Two random variables are called independent when the joint PDF factorizes: f X,Y (x, y) = f X (x)f Y (y) When RVs are independent and identically distributed this is usually abbreviated to i.i.d. Relationship to independent events: X, Y ind. iff {ω : X (ω) x}, {ω : Y (ω) y} are independent events for all x, y.

57 Working with a Joint Distribution We have similar constructions as we did in abstract prob. spaces: Marginalizing: f X (x) = Y f X,Y (x, y) dy. Similar idea to the law of total probability (identical for a discrete distribution). Conditioning: f X Y (x, y) = f X,Y (x,y) Similar to previous definition. f Y (y) = f X,Y (x,y) X f X,Y (x,y) dx. Old? Blood pressure? Heart Attack? P How to compute P(heart attack old)?

58 Characteristics of Distributions We may consider the expectation (or mean ) of a distribution: { x E(X ) = xf X (x) X is discrete xf X (x) dx X is continuous

59 Characteristics of Distributions We may consider the expectation (or mean ) of a distribution: { x E(X ) = xf X (x) X is discrete xf X (x) dx X is continuous Expectation is linear: E(aX + by + c) = x,y (ax + by + c)f X,Y (x, y) = x,y axf X,Y (x, y) + x,y byf X,Y (x, y) + x,y cf X,Y (x, y) = a x,y xf X,Y (x, y) + b x,y yf X,Y (x, y) + c x,y f X,Y (x, y) = a x x y f X,Y (x, y) + b y y x f X,Y (x, y) + c = a x xf X (x) + b y yf Y (y) + c = ae(x ) + be(y ) + c

60 Characteristics of Distributions Questions: 1. E[EX ] =

61 Characteristics of Distributions Questions: 1. E[EX ] = x (EX )f X (x) =

62 Characteristics of Distributions Questions: 1. E[EX ] = x (EX )f X (x) = (EX ) x f X (x) = EX

63 Characteristics of Distributions Questions: 1. E[EX ] = x (EX )f X (x) = (EX ) x f X (x) = EX 2. E(X Y ) = E(X )E(Y )?

64 Characteristics of Distributions Questions: 1. E[EX ] = x (EX )f X (x) = (EX ) x f X (x) = EX 2. E(X Y ) = E(X )E(Y )? Not in general, although when f X,Y = f X f Y : E(X Y ) = x,y xyf X (x)f Y (y) = x xf X (x) y yf Y (y) = EX EY

65 Characteristics of Distributions We may consider the variance of a distribution: Var(X ) = E(X EX ) 2 This may give an idea of how spread out a distribution is.

66 Characteristics of Distributions We may consider the variance of a distribution: Var(X ) = E(X EX ) 2 This may give an idea of how spread out a distribution is. A useful alternate form is: E(X EX ) 2 = E[X 2 2XE(X ) + (EX ) 2 ] = E(X 2 ) 2E(X )E(X ) + (EX ) 2 = E(X 2 ) (EX ) 2

67 Characteristics of Distributions We may consider the variance of a distribution: Var(X ) = E(X EX ) 2 This may give an idea of how spread out a distribution is. A useful alternate form is: E(X EX ) 2 = E[X 2 2XE(X ) + (EX ) 2 ] = E(X 2 ) 2E(X )E(X ) + (EX ) 2 Variance of a coin toss? = E(X 2 ) (EX ) 2

68 Characteristics of Distributions Variance is non-linear but the following holds: Var(aX ) = E(aX E(aX )) 2 = E(aX aex ) 2 = a 2 E(X EX ) 2 = a 2 Var(X )

69 Characteristics of Distributions Variance is non-linear but the following holds: Var(aX ) = E(aX E(aX )) 2 = E(aX aex ) 2 = a 2 E(X EX ) 2 = a 2 Var(X ) Var(X +c) = E(X +c E(X +c)) 2 = E(X EX +c c) 2 = E(X EX ) 2 = Var(X )

70 Characteristics of Distributions Variance is non-linear but the following holds: Var(aX ) = E(aX E(aX )) 2 = E(aX aex ) 2 = a 2 E(X EX ) 2 = a 2 Var(X ) Var(X +c) = E(X +c E(X +c)) 2 = E(X EX +c c) 2 = E(X EX ) 2 = Var(X ) Var(X + Y ) = E(X EX + Y EY ) 2 = E(X EX ) 2 + E(Y EY ) 2 +2 E(X EX )(Y EY ) }{{}}{{}}{{} Var(X ) Var(Y ) Cov(X,Y )

71 Characteristics of Distributions Variance is non-linear but the following holds: Var(aX ) = E(aX E(aX )) 2 = E(aX aex ) 2 = a 2 E(X EX ) 2 = a 2 Var(X ) Var(X +c) = E(X +c E(X +c)) 2 = E(X EX +c c) 2 = E(X EX ) 2 = Var(X ) Var(X + Y ) = E(X EX + Y EY ) 2 = E(X EX ) 2 + E(Y EY ) 2 +2 E(X EX )(Y EY ) }{{}}{{}}{{} Var(X ) Var(Y ) Cov(X,Y ) So when X, Y are independent we have: (why?) Var(X + Y ) = Var(X ) + Var(Y )

72 Putting it all together Say we have X 1... X n i.i.d., where EX i = µ and Var(X i ) = σ 2. We want to know the expectation and variance of X n = 1 n n i=1 X i. E( X n ) =

73 Putting it all together Say we have X 1... X n i.i.d., where EX i = µ and Var(X i ) = σ 2. We want to know the expectation and variance of X n = 1 n n i=1 X i. E( X n ) = E[ 1 n n X i ] = i=1

74 Putting it all together Say we have X 1... X n i.i.d., where EX i = µ and Var(X i ) = σ 2. We want to know the expectation and variance of X n = 1 n n i=1 X i. E( X n ) = E[ 1 n n X i ] = 1 n i=1 n E(X i ) = i=1

75 Putting it all together Say we have X 1... X n i.i.d., where EX i = µ and Var(X i ) = σ 2. We want to know the expectation and variance of X n = 1 n n i=1 X i. E( X n ) = E[ 1 n n X i ] = 1 n i=1 n E(X i ) = 1 n nµ = µ i=1

76 Putting it all together Say we have X 1... X n i.i.d., where EX i = µ and Var(X i ) = σ 2. We want to know the expectation and variance of X n = 1 n n i=1 X i. E( X n ) = E[ 1 n n X i ] = 1 n i=1 n E(X i ) = 1 n nµ = µ i=1 Var( X n ) = Var( 1 n n X i ) = i=1

77 Putting it all together Say we have X 1... X n i.i.d., where EX i = µ and Var(X i ) = σ 2. We want to know the expectation and variance of X n = 1 n n i=1 X i. E( X n ) = E[ 1 n n X i ] = 1 n i=1 n E(X i ) = 1 n nµ = µ i=1 Var( X n ) = Var( 1 n n i=1 X i ) = 1 n 2 nσ2 = σ2 n

78 Entropy of a Distribution Entropy is a measure of uniformity in a distribution. H(X ) = x f X (x) log 2 f X (x) Imagine you had to transmit a sample from f X, so you construct the optimal encoding scheme: Entropy gives the mean depth in the tree (= mean number of bits).

79 Law of Large Numbers (LLN) Recall our variable X n = 1 n n i=1 X i. We may wonder about its behavior as n.

80 Law of Large Numbers (LLN) Recall our variable X n = 1 n n i=1 X i. We may wonder about its behavior as n. We had: E X n = µ, Var( X n ) = σ2 n. Distribution appears to be contracting: as n increases, variance is going to 0.

81 Law of Large Numbers (LLN) Recall our variable X n = 1 n n i=1 X i. We may wonder about its behavior as n. We had: E X n = µ, Var( X n ) = σ2 n. Distribution appears to be contracting: as n increases, variance is going to 0. Using Chebyshev s inequality: P( X n µ ɛ) σ2 nɛ 2 0 For any fixed ɛ, as n.

82 Law of Large Numbers (LLN) Recall our variable X n = 1 n n i=1 X i. We may wonder about its behavior as n.

83 Law of Large Numbers (LLN) Recall our variable X n = 1 n n i=1 X i. We may wonder about its behavior as n. The weak law of large numbers: lim P( X n µ < ɛ) = 1 n In English: choose ɛ and a probability that X n µ < ɛ, I can find you an n so your probability is achieved.

84 Law of Large Numbers (LLN) Recall our variable X n = 1 n n i=1 X i. We may wonder about its behavior as n. The weak law of large numbers: lim P( X n µ < ɛ) = 1 n In English: choose ɛ and a probability that X n µ < ɛ, I can find you an n so your probability is achieved. The strong law of large numbers: P( lim X n = µ) = 1 n In English: the mean converges to the expectation almost surely as n increases.

85 Law of Large Numbers (LLN) Recall our variable X n = 1 n n i=1 X i. We may wonder about its behavior as n. The weak law of large numbers: lim P( X n µ < ɛ) = 1 n In English: choose ɛ and a probability that X n µ < ɛ, I can find you an n so your probability is achieved. The strong law of large numbers: P( lim X n = µ) = 1 n In English: the mean converges to the expectation almost surely as n increases. Two different versions, each holds under different conditions, but i.i.d. and finite variance is enough for either.

86 Central Limit Theorem (CLT) The distribution of X n also converges weakly to a Gaussian, lim F n X n (x) = Φ( x µ ) nσ Simulated n dice rolls and took average, 5000 times: n= 1 n= 2 n= 10 n= 75 Density Density Density Density h h h h

87 Central Limit Theorem (CLT) The distribution of X n also converges weakly to a Gaussian, lim F n X n (x) = Φ( x µ ) nσ Simulated n dice rolls and took average, 5000 times: n= 1 n= 2 n= 10 n= 75 Density Density Density Density h h h h Two kinds of convergence went into this picture (why 5000?):

88 Central Limit Theorem (CLT) The distribution of X n also converges weakly to a Gaussian, lim F n X n (x) = Φ( x µ ) nσ Simulated n dice rolls and took average, 5000 times: n= 1 n= 2 n= 10 n= 75 Density Density Density Density h h h h Two kinds of convergence went into this picture (why 5000?): 1. True distribution converges to a Gaussian (CLT). 2. Empirical distribution converges to true distribution (Glivenko-Cantelli).

89 Asymptotics Opinion Ideas like these are crucial to machine learning: We want to minimize error on a whole population (e.g., classify text documents as well as possible) We minimize error on a training set of size n. What happens as n? How does the complexity of the model, or the dimension of the problem affect convergence?

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i ) Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (1903-1987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll

More information

ST 371 (IV): Discrete Random Variables

ST 371 (IV): Discrete Random Variables ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible

More information

Introduction to Probability

Introduction to Probability Introduction to Probability EE 179, Lecture 15, Handout #24 Probability theory gives a mathematical characterization for experiments with random outcomes. coin toss life of lightbulb binary data sequence

More information

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference 0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures

More information

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1]. Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real

More information

5. Continuous Random Variables

5. Continuous Random Variables 5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be

More information

Math 431 An Introduction to Probability. Final Exam Solutions

Math 431 An Introduction to Probability. Final Exam Solutions Math 43 An Introduction to Probability Final Eam Solutions. A continuous random variable X has cdf a for 0, F () = for 0 <

More information

Probability for Estimation (review)

Probability for Estimation (review) Probability for Estimation (review) In general, we want to develop an estimator for systems of the form: x = f x, u + η(t); y = h x + ω(t); ggggg y, ffff x We will primarily focus on discrete time linear

More information

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8. Random variables Remark on Notations 1. When X is a number chosen uniformly from a data set, What I call P(X = k) is called Freq[k, X] in the courseware. 2. When X is a random variable, what I call F ()

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

Joint Exam 1/P Sample Exam 1

Joint Exam 1/P Sample Exam 1 Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

More information

A Tutorial on Probability Theory

A Tutorial on Probability Theory Paola Sebastiani Department of Mathematics and Statistics University of Massachusetts at Amherst Corresponding Author: Paola Sebastiani. Department of Mathematics and Statistics, University of Massachusetts,

More information

Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

More information

Statistics in Geophysics: Introduction and Probability Theory

Statistics in Geophysics: Introduction and Probability Theory Statistics in Geophysics: Introduction and Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/32 What is Statistics? Introduction Statistics is the

More information

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density HW MATH 461/561 Lecture Notes 15 1 Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density and marginal densities f(x, y), (x, y) Λ X,Y f X (x), x Λ X,

More information

Probability Theory. Florian Herzog. A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T..

Probability Theory. Florian Herzog. A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T.. Probability Theory A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T.. Florian Herzog 2013 Probability space Probability space A probability space W is a unique triple W = {Ω, F,

More information

Random variables, probability distributions, binomial random variable

Random variables, probability distributions, binomial random variable Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that

More information

Lecture 6: Discrete & Continuous Probability and Random Variables

Lecture 6: Discrete & Continuous Probability and Random Variables Lecture 6: Discrete & Continuous Probability and Random Variables D. Alex Hughes Math Camp September 17, 2015 D. Alex Hughes (Math Camp) Lecture 6: Discrete & Continuous Probability and Random September

More information

MAS108 Probability I

MAS108 Probability I 1 QUEEN MARY UNIVERSITY OF LONDON 2:30 pm, Thursday 3 May, 2007 Duration: 2 hours MAS108 Probability I Do not start reading the question paper until you are instructed to by the invigilators. The paper

More information

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

More information

M2S1 Lecture Notes. G. A. Young http://www2.imperial.ac.uk/ ayoung

M2S1 Lecture Notes. G. A. Young http://www2.imperial.ac.uk/ ayoung M2S1 Lecture Notes G. A. Young http://www2.imperial.ac.uk/ ayoung September 2011 ii Contents 1 DEFINITIONS, TERMINOLOGY, NOTATION 1 1.1 EVENTS AND THE SAMPLE SPACE......................... 1 1.1.1 OPERATIONS

More information

THE CENTRAL LIMIT THEOREM TORONTO

THE CENTRAL LIMIT THEOREM TORONTO THE CENTRAL LIMIT THEOREM DANIEL RÜDT UNIVERSITY OF TORONTO MARCH, 2010 Contents 1 Introduction 1 2 Mathematical Background 3 3 The Central Limit Theorem 4 4 Examples 4 4.1 Roulette......................................

More information

Stat 704 Data Analysis I Probability Review

Stat 704 Data Analysis I Probability Review 1 / 30 Stat 704 Data Analysis I Probability Review Timothy Hanson Department of Statistics, University of South Carolina Course information 2 / 30 Logistics: Tuesday/Thursday 11:40am to 12:55pm in LeConte

More information

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab Monte Carlo Simulation: IEOR E4703 Fall 2004 c 2004 by Martin Haugh Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab 1 Overview of Monte Carlo Simulation 1.1 Why use simulation?

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability Math/Stats 425 Introduction to Probability 1. Uncertainty and the axioms of probability Processes in the real world are random if outcomes cannot be predicted with certainty. Example: coin tossing, stock

More information

Lecture Notes 1. Brief Review of Basic Probability

Lecture Notes 1. Brief Review of Basic Probability Probability Review Lecture Notes Brief Review of Basic Probability I assume you know basic probability. Chapters -3 are a review. I will assume you have read and understood Chapters -3. Here is a very

More information

Definition and Calculus of Probability

Definition and Calculus of Probability In experiments with multivariate outcome variable, knowledge of the value of one variable may help predict another. For now, the word prediction will mean update the probabilities of events regarding the

More information

Gambling and Data Compression

Gambling and Data Compression Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction

More information

STA 256: Statistics and Probability I

STA 256: Statistics and Probability I Al Nosedal. University of Toronto. Fall 2014 1 2 3 4 5 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. Experiment, outcome, sample space, and

More information

Lecture Note 1 Set and Probability Theory. MIT 14.30 Spring 2006 Herman Bennett

Lecture Note 1 Set and Probability Theory. MIT 14.30 Spring 2006 Herman Bennett Lecture Note 1 Set and Probability Theory MIT 14.30 Spring 2006 Herman Bennett 1 Set Theory 1.1 Definitions and Theorems 1. Experiment: any action or process whose outcome is subject to uncertainty. 2.

More information

Sums of Independent Random Variables

Sums of Independent Random Variables Chapter 7 Sums of Independent Random Variables 7.1 Sums of Discrete Random Variables In this chapter we turn to the important question of determining the distribution of a sum of independent random variables

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

More information

Elements of probability theory

Elements of probability theory 2 Elements of probability theory Probability theory provides mathematical models for random phenomena, that is, phenomena which under repeated observations yield di erent outcomes that cannot be predicted

More information

Probability and statistics; Rehearsal for pattern recognition

Probability and statistics; Rehearsal for pattern recognition Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception

More information

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html 10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

More information

Aggregate Loss Models

Aggregate Loss Models Aggregate Loss Models Chapter 9 Stat 477 - Loss Models Chapter 9 (Stat 477) Aggregate Loss Models Brian Hartman - BYU 1 / 22 Objectives Objectives Individual risk model Collective risk model Computing

More information

Probability & Probability Distributions

Probability & Probability Distributions Probability & Probability Distributions Carolyn J. Anderson EdPsych 580 Fall 2005 Probability & Probability Distributions p. 1/61 Probability & Probability Distributions Elementary Probability Theory Definitions

More information

e.g. arrival of a customer to a service station or breakdown of a component in some system.

e.g. arrival of a customer to a service station or breakdown of a component in some system. Poisson process Events occur at random instants of time at an average rate of λ events per second. e.g. arrival of a customer to a service station or breakdown of a component in some system. Let N(t) be

More information

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem Time on my hands: Coin tosses. Problem Formulation: Suppose that I have

More information

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION 1 WHAT IS STATISTICS? Statistics is a science of collecting data, organizing and describing it and drawing conclusions from it. That is, statistics

More information

Lecture 7: Continuous Random Variables

Lecture 7: Continuous Random Variables Lecture 7: Continuous Random Variables 21 September 2005 1 Our First Continuous Random Variable The back of the lecture hall is roughly 10 meters across. Suppose it were exactly 10 meters, and consider

More information

An Introduction to Basic Statistics and Probability

An Introduction to Basic Statistics and Probability An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random

More information

Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University

Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University 1 Chapter 1 Probability 1.1 Basic Concepts In the study of statistics, we consider experiments

More information

University of California, Los Angeles Department of Statistics. Random variables

University of California, Los Angeles Department of Statistics. Random variables University of California, Los Angeles Department of Statistics Statistics Instructor: Nicolas Christou Random variables Discrete random variables. Continuous random variables. Discrete random variables.

More information

Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

More information

Example. A casino offers the following bets (the fairest bets in the casino!) 1 You get $0 (i.e., you can walk away)

Example. A casino offers the following bets (the fairest bets in the casino!) 1 You get $0 (i.e., you can walk away) : Three bets Math 45 Introduction to Probability Lecture 5 Kenneth Harris aharri@umich.edu Department of Mathematics University of Michigan February, 009. A casino offers the following bets (the fairest

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,

More information

STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE

STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE TROY BUTLER 1. Random variables and distributions We are often presented with descriptions of problems involving some level of uncertainty about

More information

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability CS 7 Discrete Mathematics and Probability Theory Fall 29 Satish Rao, David Tse Note 8 A Brief Introduction to Continuous Probability Up to now we have focused exclusively on discrete probability spaces

More information

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit? ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the

More information

Section 5.1 Continuous Random Variables: Introduction

Section 5.1 Continuous Random Variables: Introduction Section 5. Continuous Random Variables: Introduction Not all random variables are discrete. For example:. Waiting times for anything (train, arrival of customer, production of mrna molecule from gene,

More information

Random Variables. Chapter 2. Random Variables 1

Random Variables. Chapter 2. Random Variables 1 Random Variables Chapter 2 Random Variables 1 Roulette and Random Variables A Roulette wheel has 38 pockets. 18 of them are red and 18 are black; these are numbered from 1 to 36. The two remaining pockets

More information

Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Know the definitions of sample space, event and probability function. 2. Be able to

More information

Basic Probability Concepts

Basic Probability Concepts page 1 Chapter 1 Basic Probability Concepts 1.1 Sample and Event Spaces 1.1.1 Sample Space A probabilistic (or statistical) experiment has the following characteristics: (a) the set of all possible outcomes

More information

4.1 4.2 Probability Distribution for Discrete Random Variables

4.1 4.2 Probability Distribution for Discrete Random Variables 4.1 4.2 Probability Distribution for Discrete Random Variables Key concepts: discrete random variable, probability distribution, expected value, variance, and standard deviation of a discrete random variable.

More information

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 3 Random Variables: Distribution and Expectation Random Variables Question: The homeworks of 20 students are collected

More information

APPLIED MATHEMATICS ADVANCED LEVEL

APPLIED MATHEMATICS ADVANCED LEVEL APPLIED MATHEMATICS ADVANCED LEVEL INTRODUCTION This syllabus serves to examine candidates knowledge and skills in introductory mathematical and statistical methods, and their applications. For applications

More information

Lecture 4: BK inequality 27th August and 6th September, 2007

Lecture 4: BK inequality 27th August and 6th September, 2007 CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound

More information

Mathematical Expectation

Mathematical Expectation Mathematical Expectation Properties of Mathematical Expectation I The concept of mathematical expectation arose in connection with games of chance. In its simplest form, mathematical expectation is the

More information

Lecture 8. Confidence intervals and the central limit theorem

Lecture 8. Confidence intervals and the central limit theorem Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of

More information

Probability Generating Functions

Probability Generating Functions page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence

More information

Statistics 100A Homework 4 Solutions

Statistics 100A Homework 4 Solutions Problem 1 For a discrete random variable X, Statistics 100A Homework 4 Solutions Ryan Rosario Note that all of the problems below as you to prove the statement. We are proving the properties of epectation

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS UNIT I: RANDOM VARIABLES PART- A -TWO MARKS 1. Given the probability density function of a continuous random variable X as follows f(x) = 6x (1-x) 0

More information

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let Copyright c 2009 by Karl Sigman 1 Stopping Times 1.1 Stopping Times: Definition Given a stochastic process X = {X n : n 0}, a random time τ is a discrete random variable on the same probability space as

More information

Notes on Probability Theory

Notes on Probability Theory Notes on Probability Theory Christopher King Department of Mathematics Northeastern University July 31, 2009 Abstract These notes are intended to give a solid introduction to Probability Theory with a

More information

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i. Chapter 3 Kolmogorov-Smirnov Tests There are many situations where experimenters need to know what is the distribution of the population of their interest. For example, if they want to use a parametric

More information

Martingale Ideas in Elementary Probability. Lecture course Higher Mathematics College, Independent University of Moscow Spring 1996

Martingale Ideas in Elementary Probability. Lecture course Higher Mathematics College, Independent University of Moscow Spring 1996 Martingale Ideas in Elementary Probability Lecture course Higher Mathematics College, Independent University of Moscow Spring 1996 William Faris University of Arizona Fulbright Lecturer, IUM, 1995 1996

More information

Probability for Computer Scientists

Probability for Computer Scientists Probability for Computer Scientists This material is provided for the educational use of students in CSE2400 at FIT. No further use or reproduction is permitted. Copyright G.A.Marin, 2008, All rights reserved.

More information

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 Introduction to Discrete Probability Probability theory has its origins in gambling analyzing card games, dice,

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution

Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution 8 October 2007 In this lecture we ll learn the following: 1. how continuous probability distributions differ

More information

Chapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses

Chapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses Chapter ML:IV IV. Statistical Learning Probability Basics Bayes Classification Maximum a-posteriori Hypotheses ML:IV-1 Statistical Learning STEIN 2005-2015 Area Overview Mathematics Statistics...... Stochastics

More information

Non Parametric Inference

Non Parametric Inference Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable

More information

MATH 140 Lab 4: Probability and the Standard Normal Distribution

MATH 140 Lab 4: Probability and the Standard Normal Distribution MATH 140 Lab 4: Probability and the Standard Normal Distribution Problem 1. Flipping a Coin Problem In this problem, we want to simualte the process of flipping a fair coin 1000 times. Note that the outcomes

More information

Statistics 100A Homework 7 Solutions

Statistics 100A Homework 7 Solutions Chapter 6 Statistics A Homework 7 Solutions Ryan Rosario. A television store owner figures that 45 percent of the customers entering his store will purchase an ordinary television set, 5 percent will purchase

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1 Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Math 370, Actuarial Problemsolving Spring 2008 A.J. Hildebrand. Practice Test, 1/28/2008 (with solutions)

Math 370, Actuarial Problemsolving Spring 2008 A.J. Hildebrand. Practice Test, 1/28/2008 (with solutions) Math 370, Actuarial Problemsolving Spring 008 A.J. Hildebrand Practice Test, 1/8/008 (with solutions) About this test. This is a practice test made up of a random collection of 0 problems from past Course

More information

Bayesian Tutorial (Sheet Updated 20 March)

Bayesian Tutorial (Sheet Updated 20 March) Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that

More information

Ex. 2.1 (Davide Basilio Bartolini)

Ex. 2.1 (Davide Basilio Bartolini) ECE 54: Elements of Information Theory, Fall 00 Homework Solutions Ex.. (Davide Basilio Bartolini) Text Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips

More information

Probability density function : An arbitrary continuous random variable X is similarly described by its probability density function f x = f X

Probability density function : An arbitrary continuous random variable X is similarly described by its probability density function f x = f X Week 6 notes : Continuous random variables and their probability densities WEEK 6 page 1 uniform, normal, gamma, exponential,chi-squared distributions, normal approx'n to the binomial Uniform [,1] random

More information

INTRODUCTORY SET THEORY

INTRODUCTORY SET THEORY M.Sc. program in mathematics INTRODUCTORY SET THEORY Katalin Károlyi Department of Applied Analysis, Eötvös Loránd University H-1088 Budapest, Múzeum krt. 6-8. CONTENTS 1. SETS Set, equal sets, subset,

More information

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) CS 30, Winter 2016 by Prasad Jayanti 1. (10 points) Here is the famous Monty Hall Puzzle. Suppose you are on a game show, and you

More information

6.041/6.431 Spring 2008 Quiz 2 Wednesday, April 16, 7:30-9:30 PM. SOLUTIONS

6.041/6.431 Spring 2008 Quiz 2 Wednesday, April 16, 7:30-9:30 PM. SOLUTIONS 6.4/6.43 Spring 28 Quiz 2 Wednesday, April 6, 7:3-9:3 PM. SOLUTIONS Name: Recitation Instructor: TA: 6.4/6.43: Question Part Score Out of 3 all 36 2 a 4 b 5 c 5 d 8 e 5 f 6 3 a 4 b 6 c 6 d 6 e 6 Total

More information

Polarization codes and the rate of polarization

Polarization codes and the rate of polarization Polarization codes and the rate of polarization Erdal Arıkan, Emre Telatar Bilkent U., EPFL Sept 10, 2008 Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,...,

More information

Math 461 Fall 2006 Test 2 Solutions

Math 461 Fall 2006 Test 2 Solutions Math 461 Fall 2006 Test 2 Solutions Total points: 100. Do all questions. Explain all answers. No notes, books, or electronic devices. 1. [105+5 points] Assume X Exponential(λ). Justify the following two

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

Statistics 100A Homework 3 Solutions

Statistics 100A Homework 3 Solutions Chapter Statistics 00A Homework Solutions Ryan Rosario. Two balls are chosen randomly from an urn containing 8 white, black, and orange balls. Suppose that we win $ for each black ball selected and we

More information

MIMO CHANNEL CAPACITY

MIMO CHANNEL CAPACITY MIMO CHANNEL CAPACITY Ochi Laboratory Nguyen Dang Khoa (D1) 1 Contents Introduction Review of information theory Fixed MIMO channel Fading MIMO channel Summary and Conclusions 2 1. Introduction The use

More information

ECE302 Spring 2006 HW4 Solutions February 6, 2006 1

ECE302 Spring 2006 HW4 Solutions February 6, 2006 1 ECE302 Spring 2006 HW4 Solutions February 6, 2006 1 Solutions to HW4 Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in

More information

Math/Stats 342: Solutions to Homework

Math/Stats 342: Solutions to Homework Math/Stats 342: Solutions to Homework Steven Miller (sjm1@williams.edu) November 17, 2011 Abstract Below are solutions / sketches of solutions to the homework problems from Math/Stats 342: Probability

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance

More information

The Binomial Distribution

The Binomial Distribution The Binomial Distribution James H. Steiger November 10, 00 1 Topics for this Module 1. The Binomial Process. The Binomial Random Variable. The Binomial Distribution (a) Computing the Binomial pdf (b) Computing

More information

ECE302 Spring 2006 HW3 Solutions February 2, 2006 1

ECE302 Spring 2006 HW3 Solutions February 2, 2006 1 ECE302 Spring 2006 HW3 Solutions February 2, 2006 1 Solutions to HW3 Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in

More information

Math 370/408, Spring 2008 Prof. A.J. Hildebrand. Actuarial Exam Practice Problem Set 2 Solutions

Math 370/408, Spring 2008 Prof. A.J. Hildebrand. Actuarial Exam Practice Problem Set 2 Solutions Math 70/408, Spring 2008 Prof. A.J. Hildebrand Actuarial Exam Practice Problem Set 2 Solutions About this problem set: These are problems from Course /P actuarial exams that I have collected over the years,

More information

Basics of information theory and information complexity

Basics of information theory and information complexity Basics of information theory and information complexity a tutorial Mark Braverman Princeton University June 1, 2013 1 Part I: Information theory Information theory, in its modern format was introduced

More information

Examples: Joint Densities and Joint Mass Functions Example 1: X and Y are jointly continuous with joint pdf

Examples: Joint Densities and Joint Mass Functions Example 1: X and Y are jointly continuous with joint pdf AMS 3 Joe Mitchell Eamples: Joint Densities and Joint Mass Functions Eample : X and Y are jointl continuous with joint pdf f(,) { c 2 + 3 if, 2, otherwise. (a). Find c. (b). Find P(X + Y ). (c). Find marginal

More information

Probabilities. Probability of a event. From Random Variables to Events. From Random Variables to Events. Probability Theory I

Probabilities. Probability of a event. From Random Variables to Events. From Random Variables to Events. Probability Theory I Victor Adamchi Danny Sleator Great Theoretical Ideas In Computer Science Probability Theory I CS 5-25 Spring 200 Lecture Feb. 6, 200 Carnegie Mellon University We will consider chance experiments with

More information