Probability Review. Rob Hall. September 9, 2010
|
|
- Daisy Fields
- 7 years ago
- Views:
Transcription
1 Probability Review Rob Hall September 9, 2010
2 What is Probability? Probability reasons about a sample, knowing the population. The goal of statistics is to estimate the population based on a sample. Both provide invaluable tools to modern machine learning.
3 Plan Facts about sets (to get our brains in gear). Definitions and facts about probabilities. Random variables and joint distributions. Characteristics of distributions (mean, variance, entropy). Some asymptotic results (a high level perspective). Goals: get some intuition about probability, learn how to formulate a simple proof, lay out some useful identities for use as a reference. Non-goal: supplant an entire semester long course in probability.
4 Set Basics A set is just a collection of elements denoted e.g., S = {s 1, s 2, s 3}, R = {r : some condition holds on r}. Intersection: the elements that are in both sets: A B = {x : x A and x B} Union: the elements that are in either set, or both: A B = {x : x A or x B} Complementation: all the elements that aren t in the set: A C = {x : x A}. A C A A B B A B
5 Properties of Set Operations Commutativity: A B = B A Associativity: A (B C) = (A B) C. Likewise for intersection. Proof?
6 Properties of Set Operations Commutativity: A B = B A Associativity: A (B C) = (A B) C. Likewise for intersection. Proof? Follows easily from commutative and associative properties of and and or in the definitions.
7 Properties of Set Operations Commutativity: A B = B A Associativity: A (B C) = (A B) C. Likewise for intersection. Proof? Follows easily from commutative and associative properties of and and or in the definitions. Distributive properties: A (B C) = (A B) (A C) A (B C) = (A B) (A C) Proof?
8 Properties of Set Operations Commutativity: A B = B A Associativity: A (B C) = (A B) C. Likewise for intersection. Proof? Follows easily from commutative and associative properties of and and or in the definitions. Distributive properties: A (B C) = (A B) (A C) A (B C) = (A B) (A C) Proof? Show each side of the equality contains the other. DeMorgan s Law...see book.
9 Disjointness and Partitions A sequence of sets A 1, A 2... is called pairwise disjoint or mutually exclusive if for all i j, A i A j = {}. If the sequence is pairwise disjoint and i=1 A i = S, then the sequence forms a partition of S. Partitions are useful in probability theory and in life: B S = B ( A i ) = i=1 (def of partition) (B A i ) (distributive property) i=1 Note that the sets B A i are also pairwise disjoint (proof?).
10 Disjointness and Partitions A sequence of sets A 1, A 2... is called pairwise disjoint or mutually exclusive if for all i j, A i A j = {}. If the sequence is pairwise disjoint and i=1 A i = S, then the sequence forms a partition of S. Partitions are useful in probability theory and in life: B S = B ( A i ) = i=1 (def of partition) (B A i ) (distributive property) i=1 Note that the sets B A i are also pairwise disjoint (proof?). If S is the whole space, what have we constructed?.
11 Probability Terminology Name What it is Common What it means Symbols Sample Space Set Ω, S Possible outcomes. Event Space Collection of subsets F, E The things that have probabilities.. Probability Measure Measure P, π Assigns probabilities to events. Probability Space A triple (Ω, F, P) Remarks: may consider the event space to be the power set of the sample space (for a discrete sample space - more later).
12 Probability Terminology Name What it is Common What it means Symbols Sample Space Set Ω, S Possible outcomes. Event Space Collection of subsets F, E The things that have probabilities.. Probability Measure Measure P, π Assigns probabilities to events. Probability Space A triple (Ω, F, P) Remarks: may consider the event space to be the power set of the sample space (for a discrete sample space - more later). e.g., rolling a fair die: Ω = {1, 2, 3, 4, 5, 6}
13 Probability Terminology Name What it is Common What it means Symbols Sample Space Set Ω, S Possible outcomes. Event Space Collection of subsets F, E The things that have probabilities.. Probability Measure Measure P, π Assigns probabilities to events. Probability Space A triple (Ω, F, P) Remarks: may consider the event space to be the power set of the sample space (for a discrete sample space - more later). e.g., rolling a fair die: Ω = {1, 2, 3, 4, 5, 6} F = 2 Ω = {{1}, {2}... {1, 2}... {1, 2, 3}... {1, 2, 3, 4, 5, 6}, {}}
14 Probability Terminology Name What it is Common What it means Symbols Sample Space Set Ω, S Possible outcomes. Event Space Collection of subsets F, E The things that have probabilities.. Probability Measure Measure P, π Assigns probabilities to events. Probability Space A triple (Ω, F, P) Remarks: may consider the event space to be the power set of the sample space (for a discrete sample space - more later). e.g., rolling a fair die: Ω = {1, 2, 3, 4, 5, 6} F = 2 Ω = {{1}, {2}... {1, 2}... {1, 2, 3}... {1, 2, 3, 4, 5, 6}, {}} P({1}) = P({2}) =... = 1 (i.e., a fair die) 6 P({1, 3, 5}) = 1 (i.e., half chance of odd result) 2 P({1, 2, 3, 4, 5, 6}) = 1 (i.e., result is almost surely one of the faces).
15 Axioms for Probability A set of conditions imposed on probability measures (due to Kolmogorov) P(A) 0, A F P(Ω) = 1 P( i=1 A i) = i=1 P(A i) where {A i } i=1 disjoint. F are pairwise
16 Axioms for Probability A set of conditions imposed on probability measures (due to Kolmogorov) P(A) 0, A F P(Ω) = 1 P( i=1 A i) = i=1 P(A i) where {A i } i=1 disjoint. These quickly lead to: F are pairwise P(A C ) = 1 P(A) (since P(A) + P(A C ) = P(A A C ) = P(Ω) = 1). P(A) 1 (since P(A C ) 0). P({}) = 0 (since P(Ω) = 1).
17 P(A B) General Unions Recall that A, A C form a partition of Ω: B = B Ω = B (A A C ) = (B A) (B A C ) A A B B
18 P(A B) General Unions Recall that A, A C form a partition of Ω: B = B Ω = B (A A C ) = (B A) (B A C ) And so: P(B) = P(B A) + P(B A C ) For a general partition this is called the law of total A A B B probability.
19 P(A B) General Unions Recall that A, A C form a partition of Ω: B = B Ω = B (A A C ) = (B A) (B A C ) And so: P(B) = P(B A) + P(B A C ) For a general partition this is called the law of total A A B B probability. P(A B) = P(A (B A C ))
20 P(A B) General Unions Recall that A, A C form a partition of Ω: B = B Ω = B (A A C ) = (B A) (B A C ) And so: P(B) = P(B A) + P(B A C ) For a general partition this is called the law of total A A B B probability. P(A B) = P(A (B A C )) = P(A) + P(B A C )
21 P(A B) General Unions Recall that A, A C form a partition of Ω: B = B Ω = B (A A C ) = (B A) (B A C ) And so: P(B) = P(B A) + P(B A C ) For a general partition this is called the law of total A A B B probability. P(A B) = P(A (B A C )) = P(A) + P(B A C ) = P(A) + P(B) P(B A)
22 P(A B) General Unions Recall that A, A C form a partition of Ω: B = B Ω = B (A A C ) = (B A) (B A C ) And so: P(B) = P(B A) + P(B A C ) For a general partition this is called the law of total A A B B probability. P(A B) = P(A (B A C )) = P(A) + P(B A C ) = P(A) + P(B) P(B A) P(A) + P(B) Very important difference between disjoint and non-disjoint unions. Same idea yields the so-called union bound aka Boole s inequality
23 Conditional Probabilities For events A, B F with P(B) > 0, we may write the conditional probability of A given B: A A B B P(A B) = P(A B) P(B) Interpretation: the outcome is definitely in B, so treat B as the entire sample space and find the probability that the outcome is also in A.
24 Conditional Probabilities For events A, B F with P(B) > 0, we may write the conditional probability of A given B: A A B B P(A B) = P(A B) P(B) Interpretation: the outcome is definitely in B, so treat B as the entire sample space and find the probability that the outcome is also in A. This rapidly leads to: P(A B)P(B) = P(A B) aka the chain rule for probabilities. (why?)
25 Conditional Probabilities For events A, B F with P(B) > 0, we may write the conditional probability of A given B: A A B B P(A B) = P(A B) P(B) Interpretation: the outcome is definitely in B, so treat B as the entire sample space and find the probability that the outcome is also in A. This rapidly leads to: P(A B)P(B) = P(A B) aka the chain rule for probabilities. (why?) When A 1, A 2... are a partition of Ω: P(B) = P(B A i ) = i=1 P(B A i )P(A i ) i=1
26 Conditional Probabilities For events A, B F with P(B) > 0, we may write the conditional probability of A given B: A A B B P(A B) = P(A B) P(B) Interpretation: the outcome is definitely in B, so treat B as the entire sample space and find the probability that the outcome is also in A. This rapidly leads to: P(A B)P(B) = P(A B) aka the chain rule for probabilities. (why?) When A 1, A 2... are a partition of Ω: P(B) = P(B A i ) = i=1 P(B A i )P(A i ) i=1 This is also referred to as the law of total probability.
27 Conditional Probability Example Suppose we throw a fair die: Ω = {1, 2, 3, 4, 5, 6}, F = 2 Ω, P({i}) = 1 6, i = A = {1, 2, 3, 4} i.e., result is less than 5, B = {1, 3, 5} i.e., result is odd. P(A) =
28 Conditional Probability Example Suppose we throw a fair die: Ω = {1, 2, 3, 4, 5, 6}, F = 2 Ω, P({i}) = 1 6, i = A = {1, 2, 3, 4} i.e., result is less than 5, B = {1, 3, 5} i.e., result is odd. P(A) = 2 3 P(B) =
29 Conditional Probability Example Suppose we throw a fair die: Ω = {1, 2, 3, 4, 5, 6}, F = 2 Ω, P({i}) = 1 6, i = A = {1, 2, 3, 4} i.e., result is less than 5, B = {1, 3, 5} i.e., result is odd. P(A) = 2 3 P(B) = 1 2 P(A B) =
30 Conditional Probability Example Suppose we throw a fair die: Ω = {1, 2, 3, 4, 5, 6}, F = 2 Ω, P({i}) = 1 6, i = A = {1, 2, 3, 4} i.e., result is less than 5, B = {1, 3, 5} i.e., result is odd. P(A) = 2 3 P(B) = 1 2 P(A B) = P(A B) P(B)
31 Conditional Probability Example Suppose we throw a fair die: Ω = {1, 2, 3, 4, 5, 6}, F = 2 Ω, P({i}) = 1 6, i = A = {1, 2, 3, 4} i.e., result is less than 5, B = {1, 3, 5} i.e., result is odd. P(A) = 2 3 P(B) = 1 2 P(A B) = P(A B) P(B) = P({1, 3}) P(B)
32 Conditional Probability Example Suppose we throw a fair die: Ω = {1, 2, 3, 4, 5, 6}, F = 2 Ω, P({i}) = 1 6, i = A = {1, 2, 3, 4} i.e., result is less than 5, B = {1, 3, 5} i.e., result is odd. P(A) = 2 3 P(B) = 1 2 P(A B) = P(A B) P(B) = P({1, 3}) P(B) = 2 3
33 Conditional Probability Example Suppose we throw a fair die: Ω = {1, 2, 3, 4, 5, 6}, F = 2 Ω, P({i}) = 1 6, i = A = {1, 2, 3, 4} i.e., result is less than 5, B = {1, 3, 5} i.e., result is odd. P(A) = 2 3 P(B) = 1 2 P(A B) = P(A B) P(B) = P({1, 3}) P(B) = 2 3 P(B A) = P(A B) P(A)
34 Conditional Probability Example Suppose we throw a fair die: Ω = {1, 2, 3, 4, 5, 6}, F = 2 Ω, P({i}) = 1 6, i = A = {1, 2, 3, 4} i.e., result is less than 5, B = {1, 3, 5} i.e., result is odd. P(A) = 2 3 P(B) = 1 2 P(A B) = P(A B) P(B) = P({1, 3}) P(B) P(B A) = = 1 2 P(A B) P(A) = 2 3 Note that in general, P(A B) P(B A) however we may quantify their relationship.
35 Bayes Rule Using the chain rule we may see: P(A B)P(B) = P(A B) = P(B A)P(A) Rearranging this yields Bayes rule: Often this is written as: P(B A) = P(A B)P(B) P(A) P(B i A) = P(A B i)p(b i ) i P(A B i)p(b i ) Where B i are a partition of Ω (note the bottom is just the law of total probability).
36 Independence Two events A, B are called independent if P(A B) = P(A)P(B).
37 Independence Two events A, B are called independent if P(A B) = P(A)P(B). When P(A) > 0 this may be written P(B A) = P(B) (why?)
38 Independence Two events A, B are called independent if P(A B) = P(A)P(B). When P(A) > 0 this may be written P(B A) = P(B) (why?) e.g., rolling two dice, flipping n coins etc.
39 Independence Two events A, B are called independent if P(A B) = P(A)P(B). When P(A) > 0 this may be written P(B A) = P(B) (why?) e.g., rolling two dice, flipping n coins etc. Two events A, B are called conditionally independent given C when P(A B C) = P(A C)P(B C).
40 Independence Two events A, B are called independent if P(A B) = P(A)P(B). When P(A) > 0 this may be written P(B A) = P(B) (why?) e.g., rolling two dice, flipping n coins etc. Two events A, B are called conditionally independent given C when P(A B C) = P(A C)P(B C). When P(A) > 0 we may write P(B A, C) = P(B C)
41 Independence Two events A, B are called independent if P(A B) = P(A)P(B). When P(A) > 0 this may be written P(B A) = P(B) (why?) e.g., rolling two dice, flipping n coins etc. Two events A, B are called conditionally independent given C when P(A B C) = P(A C)P(B C). When P(A) > 0 we may write P(B A, C) = P(B C) e.g., the weather tomorrow is independent of the weather yesterday, knowing the weather today.
42 Random Variables caution: hand waving A random variable is a function X : Ω R d e.g., Roll some dice, X = sum of the numbers. Indicators of events: X (ω) = 1 A (ω). e.g., toss a coin, X = 1 if it came up heads, 0 otherwise. Note relationship between the set theoretic constructions, and binary RVs. Give a few monkeys a typewriter, X = fraction of overlap with complete works of Shakespeare. Throw a dart at a board, X R 2 are the coordinates which are hit.
43 Distributions By considering random variables, we may think of probability measures as functions on the real numbers. Then, the probability measure associated with the RV is completely characterized by its cumulative distribution function (CDF): F X (x) = P(X x). If two RVs have the same CDF we call then identically distributed. We say X F X or X f X (f X coming soon) to indicate that X has the distribution specified by F X (resp, f X ). F X(x) F X(x)
44 Discrete Distributions If X takes on only a countable number of values, then we may characterize it by a probability mass function (PMF) which describes the probability of each value: f X (x) = P(X = x).
45 Discrete Distributions If X takes on only a countable number of values, then we may characterize it by a probability mass function (PMF) which describes the probability of each value: f X (x) = P(X = x). We have: x f X (x) = 1 (why?)
46 Discrete Distributions If X takes on only a countable number of values, then we may characterize it by a probability mass function (PMF) which describes the probability of each value: f X (x) = P(X = x). We have: x f X (x) = 1 (why?) since each ω maps to one x, and P(Ω) = 1. e.g., general discrete PMF: f X (x i ) = θ i, i θ i = 1, θ i 0.
47 Discrete Distributions If X takes on only a countable number of values, then we may characterize it by a probability mass function (PMF) which describes the probability of each value: f X (x) = P(X = x). We have: x f X (x) = 1 (why?) since each ω maps to one x, and P(Ω) = 1. e.g., general discrete PMF: f X (x i ) = θ i, i θ i = 1, θ i 0. e.g., bernoulli distribution: X {0, 1}, f X (x) = θ x (1 θ) 1 x A general model of binary outcomes (coin flips etc.).
48 Discrete Distributions Rather than specifying each probability for each event, we may consider a more restrictive parametric form, which will be easier to specify and manipulate (but sometimes less general).
49 Discrete Distributions Rather than specifying each probability for each event, we may consider a more restrictive parametric form, which will be easier to specify and manipulate (but sometimes less general). e.g., multinomial distribution: X N d, d i=1 x n! i = n, f X (x) = d x 1!x 2! x d! i=1 θx i i. Sometimes used in text processing (dimensions correspond to words, n is the length of a document). What have we lost in going from a general form to a multinomial?
50 Continuous Distributions When the CDF is continuous we may consider its derivative f x(x) = d dx F X (x). This is called the probability density function (PDF).
51 Continuous Distributions When the CDF is continuous we may consider its derivative f x(x) = d F dx X (x). This is called the probability density function (PDF). The probability of an interval (a, b) is given by P(a < X < b) = b f a X (x) dx. The probability of any specific point c is zero: P(X = c) = 0 (why?).
52 Continuous Distributions When the CDF is continuous we may consider its derivative f x(x) = d F dx X (x). This is called the probability density function (PDF). The probability of an interval (a, b) is given by P(a < X < b) = b f a X (x) dx. The probability of any specific point c is zero: P(X = c) = 0 (why?). e.g., Uniform distribution: f X (x) = 1 b a 1 (a,b)(x)
53 Continuous Distributions When the CDF is continuous we may consider its derivative f x(x) = d F dx X (x). This is called the probability density function (PDF). The probability of an interval (a, b) is given by P(a < X < b) = b f a X (x) dx. The probability of any specific point c is zero: P(X = c) = 0 (why?). e.g., Uniform distribution: f X (x) = 1 b a 1 (a,b)(x) e.g., Gaussian aka normal: f X (x) = 1 2πσ exp{ (x µ)2 2σ 2 }
54 Continuous Distributions When the CDF is continuous we may consider its derivative f x(x) = d F dx X (x). This is called the probability density function (PDF). The probability of an interval (a, b) is given by P(a < X < b) = b f a X (x) dx. The probability of any specific point c is zero: P(X = c) = 0 (why?). e.g., Uniform distribution: f X (x) = 1 b a 1 (a,b)(x) e.g., Gaussian aka normal: f X (x) = 1 2πσ exp{ (x µ)2 2σ 2 } Note that both families give probabilities for every interval on the real line, yet are specified by only two numbers. dnorm (x) x
55 Multiple Random Variables We may consider multiple functions of the same sample space, e.g., X (ω) = 1 A (ω), Y (ω) = 1 B (ω): A B A B May represent the joint distribution as a table: X=0 X=1 Y= Y= We write the joint PMF or PDF as f X,Y (x, y)
56 Multiple Random Variables Two random variables are called independent when the joint PDF factorizes: f X,Y (x, y) = f X (x)f Y (y) When RVs are independent and identically distributed this is usually abbreviated to i.i.d. Relationship to independent events: X, Y ind. iff {ω : X (ω) x}, {ω : Y (ω) y} are independent events for all x, y.
57 Working with a Joint Distribution We have similar constructions as we did in abstract prob. spaces: Marginalizing: f X (x) = Y f X,Y (x, y) dy. Similar idea to the law of total probability (identical for a discrete distribution). Conditioning: f X Y (x, y) = f X,Y (x,y) Similar to previous definition. f Y (y) = f X,Y (x,y) X f X,Y (x,y) dx. Old? Blood pressure? Heart Attack? P How to compute P(heart attack old)?
58 Characteristics of Distributions We may consider the expectation (or mean ) of a distribution: { x E(X ) = xf X (x) X is discrete xf X (x) dx X is continuous
59 Characteristics of Distributions We may consider the expectation (or mean ) of a distribution: { x E(X ) = xf X (x) X is discrete xf X (x) dx X is continuous Expectation is linear: E(aX + by + c) = x,y (ax + by + c)f X,Y (x, y) = x,y axf X,Y (x, y) + x,y byf X,Y (x, y) + x,y cf X,Y (x, y) = a x,y xf X,Y (x, y) + b x,y yf X,Y (x, y) + c x,y f X,Y (x, y) = a x x y f X,Y (x, y) + b y y x f X,Y (x, y) + c = a x xf X (x) + b y yf Y (y) + c = ae(x ) + be(y ) + c
60 Characteristics of Distributions Questions: 1. E[EX ] =
61 Characteristics of Distributions Questions: 1. E[EX ] = x (EX )f X (x) =
62 Characteristics of Distributions Questions: 1. E[EX ] = x (EX )f X (x) = (EX ) x f X (x) = EX
63 Characteristics of Distributions Questions: 1. E[EX ] = x (EX )f X (x) = (EX ) x f X (x) = EX 2. E(X Y ) = E(X )E(Y )?
64 Characteristics of Distributions Questions: 1. E[EX ] = x (EX )f X (x) = (EX ) x f X (x) = EX 2. E(X Y ) = E(X )E(Y )? Not in general, although when f X,Y = f X f Y : E(X Y ) = x,y xyf X (x)f Y (y) = x xf X (x) y yf Y (y) = EX EY
65 Characteristics of Distributions We may consider the variance of a distribution: Var(X ) = E(X EX ) 2 This may give an idea of how spread out a distribution is.
66 Characteristics of Distributions We may consider the variance of a distribution: Var(X ) = E(X EX ) 2 This may give an idea of how spread out a distribution is. A useful alternate form is: E(X EX ) 2 = E[X 2 2XE(X ) + (EX ) 2 ] = E(X 2 ) 2E(X )E(X ) + (EX ) 2 = E(X 2 ) (EX ) 2
67 Characteristics of Distributions We may consider the variance of a distribution: Var(X ) = E(X EX ) 2 This may give an idea of how spread out a distribution is. A useful alternate form is: E(X EX ) 2 = E[X 2 2XE(X ) + (EX ) 2 ] = E(X 2 ) 2E(X )E(X ) + (EX ) 2 Variance of a coin toss? = E(X 2 ) (EX ) 2
68 Characteristics of Distributions Variance is non-linear but the following holds: Var(aX ) = E(aX E(aX )) 2 = E(aX aex ) 2 = a 2 E(X EX ) 2 = a 2 Var(X )
69 Characteristics of Distributions Variance is non-linear but the following holds: Var(aX ) = E(aX E(aX )) 2 = E(aX aex ) 2 = a 2 E(X EX ) 2 = a 2 Var(X ) Var(X +c) = E(X +c E(X +c)) 2 = E(X EX +c c) 2 = E(X EX ) 2 = Var(X )
70 Characteristics of Distributions Variance is non-linear but the following holds: Var(aX ) = E(aX E(aX )) 2 = E(aX aex ) 2 = a 2 E(X EX ) 2 = a 2 Var(X ) Var(X +c) = E(X +c E(X +c)) 2 = E(X EX +c c) 2 = E(X EX ) 2 = Var(X ) Var(X + Y ) = E(X EX + Y EY ) 2 = E(X EX ) 2 + E(Y EY ) 2 +2 E(X EX )(Y EY ) }{{}}{{}}{{} Var(X ) Var(Y ) Cov(X,Y )
71 Characteristics of Distributions Variance is non-linear but the following holds: Var(aX ) = E(aX E(aX )) 2 = E(aX aex ) 2 = a 2 E(X EX ) 2 = a 2 Var(X ) Var(X +c) = E(X +c E(X +c)) 2 = E(X EX +c c) 2 = E(X EX ) 2 = Var(X ) Var(X + Y ) = E(X EX + Y EY ) 2 = E(X EX ) 2 + E(Y EY ) 2 +2 E(X EX )(Y EY ) }{{}}{{}}{{} Var(X ) Var(Y ) Cov(X,Y ) So when X, Y are independent we have: (why?) Var(X + Y ) = Var(X ) + Var(Y )
72 Putting it all together Say we have X 1... X n i.i.d., where EX i = µ and Var(X i ) = σ 2. We want to know the expectation and variance of X n = 1 n n i=1 X i. E( X n ) =
73 Putting it all together Say we have X 1... X n i.i.d., where EX i = µ and Var(X i ) = σ 2. We want to know the expectation and variance of X n = 1 n n i=1 X i. E( X n ) = E[ 1 n n X i ] = i=1
74 Putting it all together Say we have X 1... X n i.i.d., where EX i = µ and Var(X i ) = σ 2. We want to know the expectation and variance of X n = 1 n n i=1 X i. E( X n ) = E[ 1 n n X i ] = 1 n i=1 n E(X i ) = i=1
75 Putting it all together Say we have X 1... X n i.i.d., where EX i = µ and Var(X i ) = σ 2. We want to know the expectation and variance of X n = 1 n n i=1 X i. E( X n ) = E[ 1 n n X i ] = 1 n i=1 n E(X i ) = 1 n nµ = µ i=1
76 Putting it all together Say we have X 1... X n i.i.d., where EX i = µ and Var(X i ) = σ 2. We want to know the expectation and variance of X n = 1 n n i=1 X i. E( X n ) = E[ 1 n n X i ] = 1 n i=1 n E(X i ) = 1 n nµ = µ i=1 Var( X n ) = Var( 1 n n X i ) = i=1
77 Putting it all together Say we have X 1... X n i.i.d., where EX i = µ and Var(X i ) = σ 2. We want to know the expectation and variance of X n = 1 n n i=1 X i. E( X n ) = E[ 1 n n X i ] = 1 n i=1 n E(X i ) = 1 n nµ = µ i=1 Var( X n ) = Var( 1 n n i=1 X i ) = 1 n 2 nσ2 = σ2 n
78 Entropy of a Distribution Entropy is a measure of uniformity in a distribution. H(X ) = x f X (x) log 2 f X (x) Imagine you had to transmit a sample from f X, so you construct the optimal encoding scheme: Entropy gives the mean depth in the tree (= mean number of bits).
79 Law of Large Numbers (LLN) Recall our variable X n = 1 n n i=1 X i. We may wonder about its behavior as n.
80 Law of Large Numbers (LLN) Recall our variable X n = 1 n n i=1 X i. We may wonder about its behavior as n. We had: E X n = µ, Var( X n ) = σ2 n. Distribution appears to be contracting: as n increases, variance is going to 0.
81 Law of Large Numbers (LLN) Recall our variable X n = 1 n n i=1 X i. We may wonder about its behavior as n. We had: E X n = µ, Var( X n ) = σ2 n. Distribution appears to be contracting: as n increases, variance is going to 0. Using Chebyshev s inequality: P( X n µ ɛ) σ2 nɛ 2 0 For any fixed ɛ, as n.
82 Law of Large Numbers (LLN) Recall our variable X n = 1 n n i=1 X i. We may wonder about its behavior as n.
83 Law of Large Numbers (LLN) Recall our variable X n = 1 n n i=1 X i. We may wonder about its behavior as n. The weak law of large numbers: lim P( X n µ < ɛ) = 1 n In English: choose ɛ and a probability that X n µ < ɛ, I can find you an n so your probability is achieved.
84 Law of Large Numbers (LLN) Recall our variable X n = 1 n n i=1 X i. We may wonder about its behavior as n. The weak law of large numbers: lim P( X n µ < ɛ) = 1 n In English: choose ɛ and a probability that X n µ < ɛ, I can find you an n so your probability is achieved. The strong law of large numbers: P( lim X n = µ) = 1 n In English: the mean converges to the expectation almost surely as n increases.
85 Law of Large Numbers (LLN) Recall our variable X n = 1 n n i=1 X i. We may wonder about its behavior as n. The weak law of large numbers: lim P( X n µ < ɛ) = 1 n In English: choose ɛ and a probability that X n µ < ɛ, I can find you an n so your probability is achieved. The strong law of large numbers: P( lim X n = µ) = 1 n In English: the mean converges to the expectation almost surely as n increases. Two different versions, each holds under different conditions, but i.i.d. and finite variance is enough for either.
86 Central Limit Theorem (CLT) The distribution of X n also converges weakly to a Gaussian, lim F n X n (x) = Φ( x µ ) nσ Simulated n dice rolls and took average, 5000 times: n= 1 n= 2 n= 10 n= 75 Density Density Density Density h h h h
87 Central Limit Theorem (CLT) The distribution of X n also converges weakly to a Gaussian, lim F n X n (x) = Φ( x µ ) nσ Simulated n dice rolls and took average, 5000 times: n= 1 n= 2 n= 10 n= 75 Density Density Density Density h h h h Two kinds of convergence went into this picture (why 5000?):
88 Central Limit Theorem (CLT) The distribution of X n also converges weakly to a Gaussian, lim F n X n (x) = Φ( x µ ) nσ Simulated n dice rolls and took average, 5000 times: n= 1 n= 2 n= 10 n= 75 Density Density Density Density h h h h Two kinds of convergence went into this picture (why 5000?): 1. True distribution converges to a Gaussian (CLT). 2. Empirical distribution converges to true distribution (Glivenko-Cantelli).
89 Asymptotics Opinion Ideas like these are crucial to machine learning: We want to minimize error on a whole population (e.g., classify text documents as well as possible) We minimize error on a training set of size n. What happens as n? How does the complexity of the model, or the dimension of the problem affect convergence?
For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )
Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (1903-1987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll
More informationST 371 (IV): Discrete Random Variables
ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible
More informationIntroduction to Probability
Introduction to Probability EE 179, Lecture 15, Handout #24 Probability theory gives a mathematical characterization for experiments with random outcomes. coin toss life of lightbulb binary data sequence
More informationWhat is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference
0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures
More informationThe sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].
Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real
More information5. Continuous Random Variables
5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be
More informationMath 431 An Introduction to Probability. Final Exam Solutions
Math 43 An Introduction to Probability Final Eam Solutions. A continuous random variable X has cdf a for 0, F () = for 0 <
More informationProbability for Estimation (review)
Probability for Estimation (review) In general, we want to develop an estimator for systems of the form: x = f x, u + η(t); y = h x + ω(t); ggggg y, ffff x We will primarily focus on discrete time linear
More informationRandom variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.
Random variables Remark on Notations 1. When X is a number chosen uniformly from a data set, What I call P(X = k) is called Freq[k, X] in the courseware. 2. When X is a random variable, what I call F ()
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informationE3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
More informationJoint Exam 1/P Sample Exam 1
Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question
More informationA Tutorial on Probability Theory
Paola Sebastiani Department of Mathematics and Statistics University of Massachusetts at Amherst Corresponding Author: Paola Sebastiani. Department of Mathematics and Statistics, University of Massachusetts,
More informationChapter 4 Lecture Notes
Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,
More informationStatistics in Geophysics: Introduction and Probability Theory
Statistics in Geophysics: Introduction and Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/32 What is Statistics? Introduction Statistics is the
More informationDefinition: Suppose that two random variables, either continuous or discrete, X and Y have joint density
HW MATH 461/561 Lecture Notes 15 1 Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density and marginal densities f(x, y), (x, y) Λ X,Y f X (x), x Λ X,
More informationProbability Theory. Florian Herzog. A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T..
Probability Theory A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T.. Florian Herzog 2013 Probability space Probability space A probability space W is a unique triple W = {Ω, F,
More informationRandom variables, probability distributions, binomial random variable
Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that
More informationLecture 6: Discrete & Continuous Probability and Random Variables
Lecture 6: Discrete & Continuous Probability and Random Variables D. Alex Hughes Math Camp September 17, 2015 D. Alex Hughes (Math Camp) Lecture 6: Discrete & Continuous Probability and Random September
More informationMAS108 Probability I
1 QUEEN MARY UNIVERSITY OF LONDON 2:30 pm, Thursday 3 May, 2007 Duration: 2 hours MAS108 Probability I Do not start reading the question paper until you are instructed to by the invigilators. The paper
More informationMULTIVARIATE PROBABILITY DISTRIBUTIONS
MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined
More informationM2S1 Lecture Notes. G. A. Young http://www2.imperial.ac.uk/ ayoung
M2S1 Lecture Notes G. A. Young http://www2.imperial.ac.uk/ ayoung September 2011 ii Contents 1 DEFINITIONS, TERMINOLOGY, NOTATION 1 1.1 EVENTS AND THE SAMPLE SPACE......................... 1 1.1.1 OPERATIONS
More informationTHE CENTRAL LIMIT THEOREM TORONTO
THE CENTRAL LIMIT THEOREM DANIEL RÜDT UNIVERSITY OF TORONTO MARCH, 2010 Contents 1 Introduction 1 2 Mathematical Background 3 3 The Central Limit Theorem 4 4 Examples 4 4.1 Roulette......................................
More informationStat 704 Data Analysis I Probability Review
1 / 30 Stat 704 Data Analysis I Probability Review Timothy Hanson Department of Statistics, University of South Carolina Course information 2 / 30 Logistics: Tuesday/Thursday 11:40am to 12:55pm in LeConte
More informationOverview of Monte Carlo Simulation, Probability Review and Introduction to Matlab
Monte Carlo Simulation: IEOR E4703 Fall 2004 c 2004 by Martin Haugh Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab 1 Overview of Monte Carlo Simulation 1.1 Why use simulation?
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
More informationMath/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability
Math/Stats 425 Introduction to Probability 1. Uncertainty and the axioms of probability Processes in the real world are random if outcomes cannot be predicted with certainty. Example: coin tossing, stock
More informationLecture Notes 1. Brief Review of Basic Probability
Probability Review Lecture Notes Brief Review of Basic Probability I assume you know basic probability. Chapters -3 are a review. I will assume you have read and understood Chapters -3. Here is a very
More informationDefinition and Calculus of Probability
In experiments with multivariate outcome variable, knowledge of the value of one variable may help predict another. For now, the word prediction will mean update the probabilities of events regarding the
More informationGambling and Data Compression
Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction
More informationSTA 256: Statistics and Probability I
Al Nosedal. University of Toronto. Fall 2014 1 2 3 4 5 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. Experiment, outcome, sample space, and
More informationLecture Note 1 Set and Probability Theory. MIT 14.30 Spring 2006 Herman Bennett
Lecture Note 1 Set and Probability Theory MIT 14.30 Spring 2006 Herman Bennett 1 Set Theory 1.1 Definitions and Theorems 1. Experiment: any action or process whose outcome is subject to uncertainty. 2.
More informationSums of Independent Random Variables
Chapter 7 Sums of Independent Random Variables 7.1 Sums of Discrete Random Variables In this chapter we turn to the important question of determining the distribution of a sum of independent random variables
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete
More informationElements of probability theory
2 Elements of probability theory Probability theory provides mathematical models for random phenomena, that is, phenomena which under repeated observations yield di erent outcomes that cannot be predicted
More informationProbability and statistics; Rehearsal for pattern recognition
Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception
More information10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
More informationAggregate Loss Models
Aggregate Loss Models Chapter 9 Stat 477 - Loss Models Chapter 9 (Stat 477) Aggregate Loss Models Brian Hartman - BYU 1 / 22 Objectives Objectives Individual risk model Collective risk model Computing
More informationProbability & Probability Distributions
Probability & Probability Distributions Carolyn J. Anderson EdPsych 580 Fall 2005 Probability & Probability Distributions p. 1/61 Probability & Probability Distributions Elementary Probability Theory Definitions
More informatione.g. arrival of a customer to a service station or breakdown of a component in some system.
Poisson process Events occur at random instants of time at an average rate of λ events per second. e.g. arrival of a customer to a service station or breakdown of a component in some system. Let N(t) be
More informationIEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem
IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem Time on my hands: Coin tosses. Problem Formulation: Suppose that I have
More informationIAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION
IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION 1 WHAT IS STATISTICS? Statistics is a science of collecting data, organizing and describing it and drawing conclusions from it. That is, statistics
More informationLecture 7: Continuous Random Variables
Lecture 7: Continuous Random Variables 21 September 2005 1 Our First Continuous Random Variable The back of the lecture hall is roughly 10 meters across. Suppose it were exactly 10 meters, and consider
More informationAn Introduction to Basic Statistics and Probability
An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random
More informationLecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University
Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University 1 Chapter 1 Probability 1.1 Basic Concepts In the study of statistics, we consider experiments
More informationUniversity of California, Los Angeles Department of Statistics. Random variables
University of California, Los Angeles Department of Statistics Statistics Instructor: Nicolas Christou Random variables Discrete random variables. Continuous random variables. Discrete random variables.
More informationLecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
More informationExample. A casino offers the following bets (the fairest bets in the casino!) 1 You get $0 (i.e., you can walk away)
: Three bets Math 45 Introduction to Probability Lecture 5 Kenneth Harris aharri@umich.edu Department of Mathematics University of Michigan February, 009. A casino offers the following bets (the fairest
More informationSF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,
More informationSTAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE
STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE TROY BUTLER 1. Random variables and distributions We are often presented with descriptions of problems involving some level of uncertainty about
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability
CS 7 Discrete Mathematics and Probability Theory Fall 29 Satish Rao, David Tse Note 8 A Brief Introduction to Continuous Probability Up to now we have focused exclusively on discrete probability spaces
More informationQuestion: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?
ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the
More informationSection 5.1 Continuous Random Variables: Introduction
Section 5. Continuous Random Variables: Introduction Not all random variables are discrete. For example:. Waiting times for anything (train, arrival of customer, production of mrna molecule from gene,
More informationRandom Variables. Chapter 2. Random Variables 1
Random Variables Chapter 2 Random Variables 1 Roulette and Random Variables A Roulette wheel has 38 pockets. 18 of them are red and 18 are black; these are numbered from 1 to 36. The two remaining pockets
More informationProbability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Know the definitions of sample space, event and probability function. 2. Be able to
More informationBasic Probability Concepts
page 1 Chapter 1 Basic Probability Concepts 1.1 Sample and Event Spaces 1.1.1 Sample Space A probabilistic (or statistical) experiment has the following characteristics: (a) the set of all possible outcomes
More information4.1 4.2 Probability Distribution for Discrete Random Variables
4.1 4.2 Probability Distribution for Discrete Random Variables Key concepts: discrete random variable, probability distribution, expected value, variance, and standard deviation of a discrete random variable.
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 3 Random Variables: Distribution and Expectation Random Variables Question: The homeworks of 20 students are collected
More informationAPPLIED MATHEMATICS ADVANCED LEVEL
APPLIED MATHEMATICS ADVANCED LEVEL INTRODUCTION This syllabus serves to examine candidates knowledge and skills in introductory mathematical and statistical methods, and their applications. For applications
More informationLecture 4: BK inequality 27th August and 6th September, 2007
CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound
More informationMathematical Expectation
Mathematical Expectation Properties of Mathematical Expectation I The concept of mathematical expectation arose in connection with games of chance. In its simplest form, mathematical expectation is the
More informationLecture 8. Confidence intervals and the central limit theorem
Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of
More informationProbability Generating Functions
page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence
More informationStatistics 100A Homework 4 Solutions
Problem 1 For a discrete random variable X, Statistics 100A Homework 4 Solutions Ryan Rosario Note that all of the problems below as you to prove the statement. We are proving the properties of epectation
More informationHypothesis Testing for Beginners
Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes
More informationUNIT I: RANDOM VARIABLES PART- A -TWO MARKS
UNIT I: RANDOM VARIABLES PART- A -TWO MARKS 1. Given the probability density function of a continuous random variable X as follows f(x) = 6x (1-x) 0
More information1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let
Copyright c 2009 by Karl Sigman 1 Stopping Times 1.1 Stopping Times: Definition Given a stochastic process X = {X n : n 0}, a random time τ is a discrete random variable on the same probability space as
More informationNotes on Probability Theory
Notes on Probability Theory Christopher King Department of Mathematics Northeastern University July 31, 2009 Abstract These notes are intended to give a solid introduction to Probability Theory with a
More information. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.
Chapter 3 Kolmogorov-Smirnov Tests There are many situations where experimenters need to know what is the distribution of the population of their interest. For example, if they want to use a parametric
More informationMartingale Ideas in Elementary Probability. Lecture course Higher Mathematics College, Independent University of Moscow Spring 1996
Martingale Ideas in Elementary Probability Lecture course Higher Mathematics College, Independent University of Moscow Spring 1996 William Faris University of Arizona Fulbright Lecturer, IUM, 1995 1996
More informationProbability for Computer Scientists
Probability for Computer Scientists This material is provided for the educational use of students in CSE2400 at FIT. No further use or reproduction is permitted. Copyright G.A.Marin, 2008, All rights reserved.
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 Introduction to Discrete Probability Probability theory has its origins in gambling analyzing card games, dice,
More informationMaximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
More informationLecture 3: Continuous distributions, expected value & mean, variance, the normal distribution
Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution 8 October 2007 In this lecture we ll learn the following: 1. how continuous probability distributions differ
More informationChapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses
Chapter ML:IV IV. Statistical Learning Probability Basics Bayes Classification Maximum a-posteriori Hypotheses ML:IV-1 Statistical Learning STEIN 2005-2015 Area Overview Mathematics Statistics...... Stochastics
More informationNon Parametric Inference
Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable
More informationMATH 140 Lab 4: Probability and the Standard Normal Distribution
MATH 140 Lab 4: Probability and the Standard Normal Distribution Problem 1. Flipping a Coin Problem In this problem, we want to simualte the process of flipping a fair coin 1000 times. Note that the outcomes
More informationStatistics 100A Homework 7 Solutions
Chapter 6 Statistics A Homework 7 Solutions Ryan Rosario. A television store owner figures that 45 percent of the customers entering his store will purchase an ordinary television set, 5 percent will purchase
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationMath 370, Actuarial Problemsolving Spring 2008 A.J. Hildebrand. Practice Test, 1/28/2008 (with solutions)
Math 370, Actuarial Problemsolving Spring 008 A.J. Hildebrand Practice Test, 1/8/008 (with solutions) About this test. This is a practice test made up of a random collection of 0 problems from past Course
More informationBayesian Tutorial (Sheet Updated 20 March)
Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that
More informationEx. 2.1 (Davide Basilio Bartolini)
ECE 54: Elements of Information Theory, Fall 00 Homework Solutions Ex.. (Davide Basilio Bartolini) Text Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips
More informationProbability density function : An arbitrary continuous random variable X is similarly described by its probability density function f x = f X
Week 6 notes : Continuous random variables and their probability densities WEEK 6 page 1 uniform, normal, gamma, exponential,chi-squared distributions, normal approx'n to the binomial Uniform [,1] random
More informationINTRODUCTORY SET THEORY
M.Sc. program in mathematics INTRODUCTORY SET THEORY Katalin Károlyi Department of Applied Analysis, Eötvös Loránd University H-1088 Budapest, Múzeum krt. 6-8. CONTENTS 1. SETS Set, equal sets, subset,
More informationDiscrete Math in Computer Science Homework 7 Solutions (Max Points: 80)
Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) CS 30, Winter 2016 by Prasad Jayanti 1. (10 points) Here is the famous Monty Hall Puzzle. Suppose you are on a game show, and you
More information6.041/6.431 Spring 2008 Quiz 2 Wednesday, April 16, 7:30-9:30 PM. SOLUTIONS
6.4/6.43 Spring 28 Quiz 2 Wednesday, April 6, 7:3-9:3 PM. SOLUTIONS Name: Recitation Instructor: TA: 6.4/6.43: Question Part Score Out of 3 all 36 2 a 4 b 5 c 5 d 8 e 5 f 6 3 a 4 b 6 c 6 d 6 e 6 Total
More informationPolarization codes and the rate of polarization
Polarization codes and the rate of polarization Erdal Arıkan, Emre Telatar Bilkent U., EPFL Sept 10, 2008 Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,...,
More informationMath 461 Fall 2006 Test 2 Solutions
Math 461 Fall 2006 Test 2 Solutions Total points: 100. Do all questions. Explain all answers. No notes, books, or electronic devices. 1. [105+5 points] Assume X Exponential(λ). Justify the following two
More informationProbability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce
More informationStatistics 100A Homework 3 Solutions
Chapter Statistics 00A Homework Solutions Ryan Rosario. Two balls are chosen randomly from an urn containing 8 white, black, and orange balls. Suppose that we win $ for each black ball selected and we
More informationMIMO CHANNEL CAPACITY
MIMO CHANNEL CAPACITY Ochi Laboratory Nguyen Dang Khoa (D1) 1 Contents Introduction Review of information theory Fixed MIMO channel Fading MIMO channel Summary and Conclusions 2 1. Introduction The use
More informationECE302 Spring 2006 HW4 Solutions February 6, 2006 1
ECE302 Spring 2006 HW4 Solutions February 6, 2006 1 Solutions to HW4 Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in
More informationMath/Stats 342: Solutions to Homework
Math/Stats 342: Solutions to Homework Steven Miller (sjm1@williams.edu) November 17, 2011 Abstract Below are solutions / sketches of solutions to the homework problems from Math/Stats 342: Probability
More informationSF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance
More informationThe Binomial Distribution
The Binomial Distribution James H. Steiger November 10, 00 1 Topics for this Module 1. The Binomial Process. The Binomial Random Variable. The Binomial Distribution (a) Computing the Binomial pdf (b) Computing
More informationECE302 Spring 2006 HW3 Solutions February 2, 2006 1
ECE302 Spring 2006 HW3 Solutions February 2, 2006 1 Solutions to HW3 Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in
More informationMath 370/408, Spring 2008 Prof. A.J. Hildebrand. Actuarial Exam Practice Problem Set 2 Solutions
Math 70/408, Spring 2008 Prof. A.J. Hildebrand Actuarial Exam Practice Problem Set 2 Solutions About this problem set: These are problems from Course /P actuarial exams that I have collected over the years,
More informationBasics of information theory and information complexity
Basics of information theory and information complexity a tutorial Mark Braverman Princeton University June 1, 2013 1 Part I: Information theory Information theory, in its modern format was introduced
More informationExamples: Joint Densities and Joint Mass Functions Example 1: X and Y are jointly continuous with joint pdf
AMS 3 Joe Mitchell Eamples: Joint Densities and Joint Mass Functions Eample : X and Y are jointl continuous with joint pdf f(,) { c 2 + 3 if, 2, otherwise. (a). Find c. (b). Find P(X + Y ). (c). Find marginal
More informationProbabilities. Probability of a event. From Random Variables to Events. From Random Variables to Events. Probability Theory I
Victor Adamchi Danny Sleator Great Theoretical Ideas In Computer Science Probability Theory I CS 5-25 Spring 200 Lecture Feb. 6, 200 Carnegie Mellon University We will consider chance experiments with
More information