NOTES ON ELEMENTARY PROBABILITY

Size: px
Start display at page:

Download "NOTES ON ELEMENTARY PROBABILITY"

Transcription

1 NOTES ON ELEMENTARY PROBABILITY KARL PETERSEN 1. Probability spaces Probability theory is an attempt to work mathematically with the relative uncertainties of random events. In order to get started, we do not attempt to estimate the probability of occurrence of any event but instead assume that somehow these have already been arrived at and so are given to us in advance. These data are assembled in the form of a probability space (X, B,P), which consists of (1) a set X, sometimes called the sample space, which is thought of as the set of all possible states of some system, or as the set of all possible outcomes of some experiment; (2) a family B of subsets of X, which is thought of as the family of observable events; and (3) a function P : B [0, 1], which for each observable event E B gives the probability P(E) of occurrence of that event. While the set X of all possible outcomes is an arbitrary set, for several reasons, which we will not discuss at this moment, the set B of observable events is not automatically assumed to consist of all subsets of X. (But if X is a finite set, then usually we do take B to be the family of all subsets of X.) We also assume that the family B of observable events and the probability measure P satisfy a minimal list of properties which permit calculations of probabilities of combinations of events: (1) P(X) = 1 (2) B contains X and is is closed under the set-theoretic operations of union, intersection, and complementation: if E,F B, then E F B,E F B, and E c = X \ E B. (Recall that E F is the set of all elements of X that are either in E or in F, E F is the set of all elements of X that are in both E and F, and E c is the set of all elements of X that are not in E.) 1

2 2 KARL PETERSEN In fact, in order to permit even more calculations (but not too many) we suppose that also the union and intersection of countably many members of B are still in B. (3) If E,F B are disjoint, so that E F =, then P(E F) = P(E) + P(F). In fact, we assume that P is countably additive: if E 1,E 2,... are pairwise disjoint (so that E i E j = if i j), then (1) P( i=1e i ) = P(E 1 E 2...) = P(E 1 )+P(E 2 )+ = P(E i ). Example 1.1. In some simple but still interesting and useful cases, X is a finite set such as {0,...,d 1} and B consists of all subsets of X. Then P is determined by specifying the value p i = P(i) of each individual point i of X. For example, the single flip of a fair coin is modeled by letting X = {0, 1}, with 0 representing the outcome heads and 1 the outcome tails, and defining P(0) = P(1) = 1/2. Note that the probabilities of all subsets of X are then determined (in the case of the single coin flip, P(X) = 1 and P( ) = 0). Exercise 1.1. Set up the natural probability space that describes the roll of a single fair die and find the probability that the outcome of any roll is a number greater than 2. Exercise 1.2. When a pair of fair dice is rolled, what is the probability that the sum of the two numbers shown (on the upward faces) is even? Exercise 1.3. In a certain lottery one gets to try to match (after paying an entry fee) a set of 6 different numbers that have been previously chosen from {1,...,30}. What is the probability of winning? Exercise 1.4. What is the probability that a number selected at random from {1,...,100} is divisible by both 3 and 7? Exercise 1.5. A fair coin is flipped 10 times. What is the probability that heads comes up twice in a row? Exercise 1.6. Ten fair coins are dropped on the floor. What is the probability that at least two of them show heads? Exercise 1.7. A fair coin is flipped ten times. What is the probability that heads comes up at least twice? Exercise 1.8. Show that if E and F are observable events in any probability space, then (2) P(E F) = P(E) + P(F) P(E F). i=1

3 NOTES ON ELEMENTARY PROBABILITY 3 2. Conditional probability Let (X, B,P) be a probability space and let Y B with P(Y ) > 0. We can restrict our attention to Y, making it the set of possible states or outcomes for a probability space as follows: (1) The set of states is Y X with P(Y ) > 0; (2) The family of observable events is defined to be (3) B Y = {E Y : E B}; (3) The probability measure P Y is defined on B Y by (4) P Y (A) = P(A) P(Y ) for all A B Y. Forming the probability space (Y, B Y,P Y ) is called conditioning on Y. It models the revision of probability assignments when the event Y is known to have occurred: we think of P Y (A) as the probability that A occurred, given that we already know that Y occurred. Example 2.1. When a fair die is rolled, the probability of an even number coming up is 1/2. What is the probability that an even number came up if we are told that the number showing is greater than 3? Then out of the three possible outcomes in Y = {4, 5, 6}, two are even, so the answer is 2/3. Definition 2.1. For any (observable) Y X with P(Y ) > 0 and any (observable) E X we define the conditional probability of E given Y to be (5) P(E Y ) = P(E Y ) = P Y (E). P(Y ) Exercise 2.1. A fair coin is flipped three times. What is the probability of at least one head? Given that the first flip was tails, what is the probability of at least one head? Exercise 2.2. From a group of two men and three women a set of three representatives is to be chosen. Each member is equally likely to be selected. Given that the set includes at least one member of each sex, what is the probability that there are more men than women in it? Definition 2.2. The observable events A and B in a probability space (X, B,P) are said to be independent in case (6) P(A B) = P(A)P(B).

4 4 KARL PETERSEN Notice that in case one of the events has positive probability, say P(B) > 0, then A and B are independent if and only if (7) P(A B) = P(A); that is, knowing that B has occurred does not change the probability that A has occurred. Example 2.2. A fair coin is flipped twice. What is the probability that heads occurs on the second flip, given that it occurs on the first flip? We model the two flips of the coin by bit strings of length two, writing 0 for heads and 1 for tails on each of the two flips. If Y is the set of outcomes which have heads on the first flip, and A is the set that has heads on the second flip, then X = {00, 01, 10, 11}, Y = {00, 01}, A = {10, 00}, and so that A Y = {00} includes exactly one of the two elements of Y. Since each of the four outcomes in X is equally likely, P(A Y ) = P(A Y ) P(Y ) = A Y Y Thus we see that A and Y are independent. = 1 2 = P(A). This example indicates that the definition of independence in probability theory reflects our intuitive notion of events whose occurrences do not influence one another. If repeated flips of a fair coin are modeled by a probability space consisting of bit strings of length n, all being equally likely, then an event whose occurrence is determined by a certain range of coordinates is independent of any other event that is determined by a disjoint range of coordinates. Example 2.3. A fair coin is flipped four times. Let A be the event that we obtain a head on the second flip and B be the event that among the first, third, and fourth flips we obtain at least two heads. Then A and B are independent. Exercise 2.3. Show that the events A and B described in the preceding example will be independent whether or not the coin being flipped is fair.

5 NOTES ON ELEMENTARY PROBABILITY 5 Exercise 2.4. Show that events A and B described in the preceding example will be independent even if the probability of heads could be different on each flip. Exercise 2.5. When a pair of fair dice is rolled, is the probability of the sum of the numbers shown being even independent of it being greater than six? 3. Bayes Theorem Looking at the definition of conditional probability kind of backwards leads very easily to a simple formula that is highly useful in practice and has profound implications for the foundations of probability theory (frequentists, subjectivists, etc.). We use the notation from [1], in which C is an event, thought of as a cause, such as the presence of a disease, and I is another event, thought of as the existence of certain information. The formula can be interpreted as telling us how to revise our original estimate P(C) that the cause C is present if we are given the information I. Theorem 3.1 (Bayes Theorem). Let (X, B,P) be a probability space and let C,I B with P(I) > 0. Then (8) P(C I) = P(C) P(I C) P(I). Proof. We just use the definitions of the conditional probabilities: (9) P(C I) = and the fact that C I = I C. P(C I), P(I C) = P(I) P(I C) P(C) Example 3.1. We discuss the example in [1, p. 77] in this notation. C is the event that a patient has cancer, and P(C) is taken to be.01, the incidence of cancer in the general population for this example taken to be 1 in 100. I is the event that the patient tests positive on a certain test for this disease. The test is said to be 99% accurate, which we take to mean that the probability of error is less than.01, in the sense that P(I C c ) <.01 and P(I c C) <.01. Then P(I C) 1, and (10) P(I) = P(I C)P(C) + P(I C c )P(C c ).01 + (.01)(.99).02.

6 6 KARL PETERSEN Applying Bayes Theorem, (11) P(C I) = P(C) P(I C) P(I) (.01) The surprising conclusion is that even with such an apparently accurate test, if someone tests positive for this cancer there is only a 50% chance that he actually has the disease. Often Bayes Theorem is stated in a form in which there are several possible causes C 1,C 2,... which might lead to a result I with P(I) > 0. If we assume that the observable events C 1,C 2,... form a partition of the probability space X, so that they are pairwise disjoint and their union is all of X, then (12) P(I) = P(I C 1 )P(C 1 ) + P(I C 2 )P(C 2 ) +..., and Equation (8) says that for each i, P(I C i ) (13) P(C i I) = P(C i ) P(I C 1 )P(C 1 ) + P(I C 2 )P(C 2 ) This formula applies for any finite number of observable events C i as well as for a countably infinite number of them. Exercise 3.1. Suppose we want to use a set of medical tests to look for the presence of one of two diseases. Denote by S the event that the test gives a positive result and by D i the event that a patient has disease i = 1, 2. Suppose we know the incidences of the two diseases in the population: (14) P(D 1 ) =.07, P(D 2 ) =.05, P(D 1 D 2 ) =.01. From studies of many patients over the years it has also been learned that (15) P(S D 1 ) =.9, P(S D 2 ) =.8, P(S (D 1 D 2 ) c ) =.05, P(S D 1 D 2 ) =.99. (a) Form a partition of the underlying probability space X that will help to analyze this situation. (b) Find the probability that a patient has disease 1 if the battery of tests turns up positive. (c) Find the probability that a patient has disease 1 but not disease 2 if the battery of tests turns up positive.

7 NOTES ON ELEMENTARY PROBABILITY 7 4. Bernoulli trials In Section 2 we came across independent repeated trials of an experiment, such as flipping a coin or rolling a die. Such a sequence is conveniently represented by a probability space whose elements are strings on a finite alphabet. Equivalently, if a single run of the experiment is modeled by a probability space (D, B,P), then n independent repetitions of the experiment are modeled by the Cartesian product of D with itself n times, with the probability measure formed by a product of P with itself n times. We now state this more precisely. Let (D, B,P) be a probability space with D = {0,...,d 1}, B = the family of all subsets of D, P(i) = p i > 0 for i = 0,...,d 1. Denote by D (n) the Cartesian product of D with itself n times. Thus D (n) consists of all ordered n-tuples (x 1,...,x n ) with each x i D,i = 1,...,n. If we omit the commas and parentheses, we can think of each element of D (n) as a string of length n on the alphabet D. Example 4.1. If D = {0, 1} and n = 3, then D (n) = {000, 001, 010, 011, 100, 101, 110, 111}, the set of all bit strings of length 3. We now define the set of observables in D (n) to be B (n) = the family of all subsets of D (n). The probability measure P (n) on D (n) is determined by (16) P (n) (x 1 x 2...x n ) = P(x 1 )P(x 2 ) P(x n ) for each x 1 x 2...x n D (n). This definition of P (n) in terms of products of probabilities seen in the different coordinates (or entries) of a string guarantees the independence of two events that are determined by disjoint ranges of coordinates. Note that this holds true even if the strings of length n are not all equally likely. Exercise 4.1. A coin whose probability of heads is p, with 0 < p < 1/2, is flipped three times. Write out the probabilities of all the possible outcomes. If A is the event that the second flip produces heads, and B is the event that either the first or third flip produces tails, find P (3) (A B) and P (3) (A)P (3) (B).

8 8 KARL PETERSEN Let D = {0, 1} and P(0) = p (0, 1),P(1) = 1 p. Construct as above the probability space (D (n), B (n),p (n) ) representing n independent repetitions of the experiment (D, B, P). The binomial distribution gives the probability for each k = 0, 1,...,n of the set of strings of length n that contain exactly k 0 s. recall that C(n,k) denotes the binomial coefficient n!/(k!(n k)!), the number of k-element subsets of a set with n elements. Proposition 4.1. Let (D (n), B (n),p (n) ) be as described above. Then for each k = 0, 1,...,n, (17) P (n) {x 1...x n D (n) : x i = 0 for k choices of i = 1,...,n} = C(n,k)p k (1 p) n k. Proof. For each subset S of {1,...,n}, let E(S) = {x D (n) : x i = 0 if and only if i S}. Note that if S 1 and S 2 are different subsets of {1,...,n}, then E(S 1 ) and E(S 2 ) are disjoint. Fix k = 0, 1,...,n. There are C(n,k) subsets of {1,...,n} which have exactly k elements, and for each such subset S we have P (n) (E(S)) = p k (1 p) n k. Adding up the probabilities of these disjoint sets gives the result. Exercise 4.2. For the situation in Exercise 4.1 and each k = 0, 1, 2, 3, list the elements of A k = the event that exactly k heads occur. Also calculate the probability of each A k. Representing repetitions of an experiment with finitely many possible outcomes by strings on a finite alphabet draws an obvious connection with the modeling of information transfer or acquisition. A single experiment can be viewed as reading a single symbol, which is thought of as the outcome of the experiment. We can imagine strings (or experimental runs) of arbitrary lengths, and in fact even of infinite length. For example, we can consider the space of one-sided infinite bit strings (18) Ω + = {x 0 x 1 x 2 : each x i = 0 or 1}, as well as the space of two-sided infinite bit strings (19) Ω = {...x 1 x 0 x 1 : each x i = 0 or 1}.

9 NOTES ON ELEMENTARY PROBABILITY 9 Given p with 0 < p < 1, we can again define a probability measure for many events in either of these spaces: for example, (20) P p ( ) {x : x 2 = 0,x 6 = 1,x 7 = 1} = p(1 p)(1 p). A set such as the one above, determined by specifying the entries in a finite number of places in a string, is called a cylinder set. Let us define the probability of each cylinder set in accord with the idea that 0 s and 1 s are coming independently, with probabilities p and 1 p, respectively. Thus, if 0 i 1 < i 2 < < i r, each a 1,...,a r = 0 or 1, and s of the a j s are 0, let (21) P ( ) p {x Ω + : x i1 = a 1,...,x ir = a r } = p s (1 p) r s. It takes some effort (which we will not expend at this moment) to see that this definition does not lead to any contradictions, and that there is a unique extension of P p ( ) so as to be defined on a family B ( ) which contains all the cylinder sets and is closed under complementation, countable unions, and countable intersections. Definition 4.1. If D is an arbitrary finite set, we denote by Ω + (D) the set of one-sided infinite strings x 0 x 1 x 2... with entries from the alphabet D, and we denote by Ω(D) the set of two-sided infinite strings with entries from D. We abbreviate Ω + = Ω + ({0, 1}) and Ω = Ω({0, 1}). With each of these sequence spaces we deal always with a fixed family B of observable events which contains the cylinder sets and is closed under countable unions, countable intersections, and complementation. The spaces Ω + (D) and Ω ( D) are useful models of information sources, especially when combined with a family of observables B which contains all cylinder sets and with a probability measure P defined on B. (We are dropping the extra superscripts on B and P in order to simplify the notation.) Given a string a = a 0...a r 1 on the symbols of the alphabet D and a time n 0, the probability that the source emits the string at time n is given by the probability of the cylinder set {x : x n = a 0,x 1 = a 1,...,x n+r 1 = a r 1 }. Requiring that countable unions and intersections of observable events be observable allows us to consider quite interesting and complicated events, including various combinations of infinite sequences of events. Example 4.2. In the space Ω + constructed above, with the probability measure P p ( ), let us see that the set of (one-sided) infinite strings which contain infinitely many 0 s has probability 1. For this purpose

10 10 KARL PETERSEN we assume (as can be proved rigorously) that the probability space (Ω +, B ( ( ) ),P p ) does indeed satisfy the properties set out axiomatically at the beginning of these notes. Let A = {x Ω + : x i = 0 for infinitely many i}. We aim to show that P ( ) (A c ) = 0 (A c = Ω + \ A = the complement of A), and hence P ( ) (A) = 1. For each n = 0, 1, 2,... let B n = {x Ω + : x n = 0 but x i = 1 for all i > n}, and let B 1 consist of the single string Then the sets B n are pairwise disjoint and their union is A c. By countable additivity, P ( ) p ( n= 1 so it is enough to show that and B n ) = n= 1 P ( ) p (B n ), P ( ) p (B n ) = 0 for all n. Fix any n = 1, 0, 1, 2,.... For each r = 1, 2,..., B n Z n+1,n+r = {x Ω + : x n+1 = x n+2 = = x n+r = 1}, P ( ) p (Z n+1,n+r ) = (1 p) r. Since 0 < 1 p < 1, we have (1 p) r 0 as r, so P ( ) p (B n ) = 0 for each n. If A is an observable event in any probability space which has probability 1, then we say that A occurs almost surely, or with probability 1. If some property holds for all points x D in a set of probability 1, then we say that the property holds almost everywhere. Exercise 4.3. In the probability space (Ω +, B ( ),P ( ) p ) constructed above, find the probability of the set of infinite strings of 0 s and 1 s which never have two 1 s in a row. (Hint: For each n = 0, 1, 2,... consider B n = {x Ω + : x 2n x 2n+1 11}.)

11 NOTES ON ELEMENTARY PROBABILITY Markov chains Symbols in strings or outcomes of repeated experiments are not always completely independent of one another frequently there are relations, interactions, or dependencies among the entries in various coordinates. In English text, the probabilities of letters depend heavily on letters near them: h is much more likely to follow t than to follow f. Some phenomena can show very long-range order, even infinite memory. Markov chains model processes with only short-range memory, in which the probability of what symbol comes next depends only on a fixed number of the immediately preceding symbols. In the simplest case, 1-step Markov chains, the probability of what comes next depends only on the immediately preceding symbol. The outcome of any repetition of the experiment depends only on the outcome of the immediately preceding one and not on any before that. The precise definition of a Markov chain on a finite state space, or alphabet, D = {0, 1,...,d 1} is as follows. The sample space is the set Σ + of all one-sided (could be also two-sided) infinite sequences x = x 0 x 1... with entries from the alphabet D. The family of observable events again contains all the cylinder sets. The probability measure M is determined by two pieces of data: (1) a probability vector p = (p 0,...,p d 1 ), with each p i 0 and p p d 1 = 1, giving the initial distribution for the chain; (2) a matrix P = (P ij ) giving the transition probabilities between each pair of states i,j D. It is assumed that each P ij 0 and that for each i we have P i1 + P i2 + + P i,d 1 = 1. Such a P is called a stochastic matrix. Now the probability of each basic cylinder set determined by fixing the first n entries at values a 0,...,a n 1 D is defined to be (22) M{x Σ + : x 0 = a 0,...,x n 1 = a n 1 } = p a0 P a0 a 1 P a1 a 2...P an 2,n 1. The idea here is simple. The initial symbol of a string, at coordinate 0, is selected with probability determined by the initial distribution p: symbol i has probability p i of appearing, for each i = 0, 1,...,d 1. Then given that symbol, a 0, the probability of transitioning to any other symbol is determined by the entries in the matrix P, specifically the entries in row a 0 : the probability that a 1 comes next, given that we just saw a 0 is P a0 a 1. And so on. The condition that the matrix P have

12 12 KARL PETERSEN row sums 1 tells us that we are sure to be able to add some symbol each time. The 1-step memory property can be expressed as follows. For any choice of symbols a 0,...,a n, M{x Σ + : x n = a n x 0 = a 0,...,x n 1 = a n 1 } = M{x Σ + : x n = a n x n 1 = a n 1 }. Finite-state Markov chains are conveniently visualized in terms of random paths on directed graphs /2 0 1/4 1/4 1/2 2 1/2 Here the states are 0, 1, 2 and the transition probabilities between states are the labels on the arrows. Thus the stochastic transition matrix is P = /2 1/4 1/4. 1/2 0 1/2 If we specified an initial distribution p = (1/6, 1/2, 1/3) listing the initial probabilities of the states 0, 1, 2, respectively, then the probabilities of strings starting at the initial coordinate would be calculated as in this example: M{x Σ + : x 0 = 1,x 2 = 1,x 3 = 0} = p 1 P 11 P 10 = = Exercise 5.1. For the example above, with p and P as given, find the probabilities of all the positive-probability strings of length 3. Recall that the vector p = (p 0,...,p d 1 ) gives the initial distribution: the probability that at time 0 the system is in state j {0,...,d 1} is p j. So what is the probability that the system is in state j at time 1? Well, the event that the system is in state j at time 1, namely {x Σ + : x 1 = j}, is the union of d disjoint sets defined by the different

13 possible values of x 0 : NOTES ON ELEMENTARY PROBABILITY 13 d 1 (23) {x Σ + : x 1 = j} = {x Σ + : x 0 = i,x 1 = j}. Since the i th one of these sets has probability p i P ij, we have i=0 d 1 (24) M{x Σ + : x 1 = j} = p i P ij. So we have determined the distribution p (1) of the chain at time 1. The equations d 1 (25) p (1) j = p i P ij for j = 0,...,d 1 i=0 are abbreviated, using multiplication of vectors by matrices, by (26) p (1) = pp. i=0 Similarly, the distribution at time 2 is given by (27) p (2) = p (1) P = pp 2, where P 2 is the square of the matrix P according to matrix multiplication. And so on: the probability that at any time n = 0, 1, 2,... the chain is in state j = 0,...,d 1 is (pp n ) j, namely, the j th entry of the vector obtained by multiplying the initial distribution vector p on the right n times by the stochastic transition matrix P. Here s a quick definition of matrix multiplication. Suppose that A is a matrix with m rows and n columns (m,n 1; if either equals 1, A is a (row or column) vector). Suppose that B is a matrix with n rows and p columns. Then AB is defined as a matrix with m rows and p columns. The entry in the i th row and j th column of the product AB is formed by using the i th row of A and the j th column of B: take the sum of the products of the entries in the i th row of A (there are n of them) with the entries in the j th column of B (there are also n of these) this is the dot product or scalar product of the i th row of A with the j th column of B: n (28) (AB) ij = A ik B kj, for i = 1,...,m;j = 1,...,p. k=1 Note that here we have numbered entries starting with 1 rather than with 0. (This is how Matlab usually does it).

14 14 KARL PETERSEN Markov chains have many applications in physics, biology, psychology (learning theory), and even sociology. Here is a nonrealistic indication of possible applications. Exercise 5.2. Suppose that a certain study divides women into three groups according to their level of education: completed college, completed high school but not college, or did not complete high school. Suppose that data are accumulated showing that the daughter of a college-educated mother has a probability.7 of also completing college, probability.2 of only making it through high school, and probability.1 of not finishing high school; the daughter of a mother who only finished high school has probabilities.5,.3, and.2, respectively, of finishing college, high school only, or neither; and the daughter of a mother who did not finish high school has corresponding probabilities.3,.4, and.3. (a) We start with a population in which 30% of women finished college, 50% finished high school but not college, and 20% did not finish high school. What is the probability that a granddaughter of one of these women who never finished high school will make it through college? (b) Suppose that the initial distribution among the different groups is (.5857,.2571,.1571). What will be the distribution in the next generation? The one after that? The one after that? Remark 5.1. Under some not too stringent hypotheses, the powers P k of the stochastic transition matrix P of a Markov chain will converge to a matrix Q all of whose rows are equal to the same vector q, which then satisfies qq = q and is called the stable distribution for the Markov chain. You can try this out easily on Matlab by starting with various stochastic matrices P and squaring repeatedly. 6. Space mean and time mean Definition 6.1. A random variable on a probability space (X, B,P) is a function f : X R such that for each interval (a,b) of real numbers, the event {x X : f(x) (a, b)} is an observable event. More briefly, (29) f 1 (a,b) B for all a,b R. This definition seeks to capture the idea of making measurements on a random system, without getting tangled in talk about numbers fluctuating in unpredictable ways.

15 NOTES ON ELEMENTARY PROBABILITY 15 Example 6.1. In an experiment of rolling two dice, a natural sample space is X = {(i,j) : i,j = 1,...,6}. We take B = the family of all subsets of X and assume that all 36 outcomes are equally likely. One important random variable on this probability space is the sum of the numbers rolled: s(i,j) = i + j for all (i,j) X. Example 6.2. If X is the set of bit strings of length 7, B = all subsets of X, and all strings are equally likely, we could consider the random variable s(x) = x x 6 = number of 1 s in x. In the following definitions let (X, B,P) be a probability space. Definition 6.2. A partition of X is a family {A 1,...,A n } of observable subsets of X (each A i B) which are pairwise disjoint and whose union is X. The sets A i are called the cells of the partition. Definition 6.3. A simple random variable on X is a random variable f : X R for which there is a partition {A 1,...,A n } of X such that f is constant on each cell A i of the partition: there are c 1,...,c n R such that f(x) = c i for all x A i,i = 1,...,n. Definition 6.4. Let f be a simple random variable as in Definition 6.3. We define the space mean, or expected value, or expectation of f to be n (30) E(f) = c i P(A i ). i=1 Example 6.3. Let the probability space and random variable f be as in Example 6.1 the sum of the numbers showing. To compute the expected value of f = s, we partition the set of outcomes according to the value of the sum: let A j = s 1 (j),j = 2,...,12. Then we figure out the probability of each cell of the partition. Since all outcomes are assumed to be equally likely, the probability that s(x) = i is the number of outcomes x that produce sum i, times the probability (1/36) of each outcome. Now the numbers of ways to roll 2, 3,...,12, respectively, are seen by inspection to be 1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1. Multiply each value (2 through 12) of the random variable s by the probability that it takes that value (1/36, 2/36,...,1/36) and add these up to get E(s) = 7. Thus 7 is the expected sum on a roll of a pair of dice. This is the mean or average sum. The expected value is not always the same as

16 16 KARL PETERSEN the most probable value (if there is one) called the mode as the next example shows. Exercise 6.1. Find the expected value of the random variable in Example 6.2. Exercise 6.2. Suppose that the bit strings of length 7 in Example 6.2 are no longer equally likely but instead are given by the probability measure P (7) on {0, 1} (7) determined by P(0) = 1/3,P(1) = 2/3. Now what is the expected value of the number of 1 s in a string chosen at random? The expectation of a random variable f is its average value over the probability space X, taking into account that f may take values in some intervals with greater probability than in others. If the probability space modeled a game in which an entrant received a payoff of f(x) dollars in case the random outcome were x X, the expectation E(f) would be considered a fair price to pay in order to play the game. (Gambling establishments charge a bit more than this, so that they will probably make a profit.) We consider now a situation in which we make not just a single measurement f on a probability space (X, B,P) but a sequence of measurements f 1,f 2,f 3,... A sequence of random variables is called a stochastic process. If the system is in state x X, then we obtain a sequence of numbers f 1 (x),f 2 (x),f 3 (x),..., and we think of f i (x) as the result of the observation that we make on the system at time i = 1, 2, 3,... It is natural to form the averages of these measurements: (31) A n {f i }(x) = 1 n f k (x) n is the average of the first n measurements. If we have an infinite sequence f 1,f 2,f 3,... of measurements, we can try to see whether these averages settle down around a limiting value 1 (32) A {f i }(x) = lim n n k=1 n f k (x). Such a limiting value may or may not exist quite possibly the sequence of measurements will be wild and the averages will not converge to any limit. k=1

17 NOTES ON ELEMENTARY PROBABILITY 17 We may look at the sequence of measurements and time averages in a different way: rather than imagining that we make a sequence of measurements on the system, we may imagine that we make the same measurement f on the system each time, but the system changes with time. This is the viewpoint of dynamical systems theory; in a sense the two viewpoints are equivalent. Example 6.4. Consider the system of Bernoulli trials (Ω +, B ( ),P ( ) p ) described above: the space consists of one-sided infinite sequences of 0 s and 1 s, the bits arriving independently with P(0) = p and P(1) = 1 p. We can read a sequence in two ways. (1) For each i = 0, 1, 2,..., let f i (x) = x i. We make a different measurement at each instant, always reading off the bit that is one place more to the right than the previously viewed one. (2) Define the shift transformation σ : Ω + Ω + by σ(x 0 x 1 x 2...) = x 1 x 2... This transformation lops off the first entry in each infinite bit string and shifts the remaining ones one place to the left. For each i = 1, 2,..., σ i denotes the composition of σ with itself i times; thus σ 2 lops off the first two places while shifting the sequence two places to the left. On the set Ω of two-sided infinite sequences we can shift in both directions, so we can consider σ i for i Z. Now let f(x) = x 0 for each x Ω +. Then the previous f i (x) = f(σ i x) for all i = 0, 1, 2,... In this realization, we just sit in one place, always observing the first entry in the bit string x as the string streams by toward the left. This seems to be maybe a more relaxed way to make measurements. Besides that, the dynamical viewpoint has many other advantages. For example, many properties of the stochastic processes {f(σ i x)}, can be deduced from study of the action of σ on (Ω +, B ( ),P p ( ) ) alone, independently of the particular choice of f. Exercise 6.3. In the example (Ω +, B ( ),P p ( ) ) just discussed, with f(x) = x 0 as above, do you think that the time average 1 A f(x) = lim n n n f(σ k x) will exist? (For all x? For most x?) If it were to exist usually, what should it be? k=1

18 18 KARL PETERSEN Exercise 6.4. Same as the preceding exercise, but with f replaced by { 1 if x 0 x 1 = 01 f(x) = 0 otherwise. 7. Stationary and ergodic information sources We have already defined an information source. It consists of the set of one- or two-sided infinite strings Ω + (D) or Ω(D) with entries from a finite alphabet D; a family B of subsets of the set of strings which contains all the cylinder sets and is closed under complementation and countable unions and intersections; and a probability measure P defined for all sets in B. (For simplicity we continue to delete superscripts on B and P.) We also have the shift transformation, defined on each of Ω + (D) and Ω(D) by (σx) i = x i+1 for all indices i. If f(x) = x 0, then observing f(σ k x) for k = 0, 1, 2,... reads the sequence x = x 0 x 1 x 2... as σ makes time go by. Definition 7.1. An information source as above is called stationary if the probability measure P is shift-invariant: given any word a = a 0 a 2...a r 1 and any two indices n and m in the allowable range of indices (Z for Ω(D), {0, 1, 2,...} for Ω + (D)), (33) P {x : x n = a 0,x n+1 = a 1,...,x n+r 1 = a r 1 } = P {x : x m = a 0,x m+1 = a 1,...,x m+r 1 = a r 1 }. The idea is that a stationary source emits its symbols, and in fact consecutive strings of symbols, according to a probability measure that does not change with time. The probability of seeing a string such as 001 is the same at time 3 as it is at time Such a source can be thought of as being in an equilibrium state whatever mechanisms are driving it (which are probably random in some way) are not having their basic principles changing with time. Example 7.1. The Bernoulli sources discussed above are stationary. This is clear from the definition of the probability of the cylinder set determined by any word as the product of the probabilities of the individual symbols in the word. Example 7.2. Consider a Markov source as above determined by an initial distribution p and a stochastic transition matrix P. If p is in fact a stable distribution for P (see Remark 5.1), pp = p,

19 NOTES ON ELEMENTARY PROBABILITY 19 then the Markov process, considered as an information source, is stationary. Definition 7.2. A stationary information source as above is called ergodic if for every simple random variable f on the set of sequences, the time mean of f almost surely equals the space mean of f. More precisely, the set of sequences x for which 1 (34) A f(x) = lim n n n f(σ k x) = E(f) (in the sense that the limit exists and equals E(f)) has probability 1. k=1 In fact, it can be shown that in order to check whether or not a source is ergodic, it is enough to check the definition for random variables f which are the characteristic functions of cylinder sets. Given a word a = a 0 a 1 a 2...a r 1, define { 1 if x 0 x 1...x r 1 = a (35) f a (x) = 0 otherwise. Ergodicity is then seen to be equivalent to requiring that in almost every sequence, every word appears with limiting frequency equal to the probability of any cylinder set defined by that word. Here almost every sequence means a set of sequences which has probability one. Example 7.3. The Bernoulli systems defined above are all ergodic. This is a strong version of Jakob Bernoulli s Law of Large Numbers (1713). What kinds of sources are not ergodic, you ask? It s easiest to give examples if one knows that ergodicity is equivalent to a kind of indecomposability of the probability space of sequences. Example 7.4. Let us consider an information source which puts out one-sided sequences on the alphabet D = {0, 1}. Let us suppose that the probability measure P governing the outputs is such that with probability 1/2 we get a constant string of 0 s, otherwise we get a string of 0 s and 1 s coming independently with equal probabilities. If we consider the simple random variable f 0, which gives a value of 1 if x 0 = 0 and otherwise gives a value 0, we see that on a set of probability 1/2 the time mean of f 0 is 1, while on another set of probability 1/2 it is 1/2 (assuming the result stated in Example 7.3). Thus, no matter the value of E(f 0 ), we cannot possibly have A f 0 = E(f 0 ) almost surely.

20 20 KARL PETERSEN Exercise 7.1. Calculate the space mean of the random variable f 0 in the preceding example. Exercise 7.2. Calculate the space mean and time mean of the random variable f 1 in the preceding example (see Formula (35)). References [1] Hans Christian von Baeyer, Information: The New Language of Science, Phoenix, London, 2004.

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

6.3 Conditional Probability and Independence

6.3 Conditional Probability and Independence 222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted

More information

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability Math/Stats 425 Introduction to Probability 1. Uncertainty and the axioms of probability Processes in the real world are random if outcomes cannot be predicted with certainty. Example: coin tossing, stock

More information

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

People have thought about, and defined, probability in different ways. important to note the consequences of the definition: PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A COURSE ON MOLECULAR EVOLUTION (BIOL 3046) Probability The subject of PROBABILITY is a branch of mathematics dedicated to building models

More information

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit? ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the

More information

Basic Probability Concepts

Basic Probability Concepts page 1 Chapter 1 Basic Probability Concepts 1.1 Sample and Event Spaces 1.1.1 Sample Space A probabilistic (or statistical) experiment has the following characteristics: (a) the set of all possible outcomes

More information

Random variables, probability distributions, binomial random variable

Random variables, probability distributions, binomial random variable Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that

More information

Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University

Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University 1 Chapter 1 Probability 1.1 Basic Concepts In the study of statistics, we consider experiments

More information

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson Mathematics for Computer Science/Software Engineering Notes for the course MSM1F3 Dr. R. A. Wilson October 1996 Chapter 1 Logic Lecture no. 1. We introduce the concept of a proposition, which is a statement

More information

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products Chapter 3 Cartesian Products and Relations The material in this chapter is the first real encounter with abstraction. Relations are very general thing they are a special type of subset. After introducing

More information

Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

More information

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1]. Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real

More information

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March

More information

NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

Row Echelon Form and Reduced Row Echelon Form

Row Echelon Form and Reduced Row Echelon Form These notes closely follow the presentation of the material given in David C Lay s textbook Linear Algebra and its Applications (3rd edition) These notes are intended primarily for in-class presentation

More information

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2. Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

Solving Systems of Linear Equations

Solving Systems of Linear Equations LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how

More information

Unit 19: Probability Models

Unit 19: Probability Models Unit 19: Probability Models Summary of Video Probability is the language of uncertainty. Using statistics, we can better predict the outcomes of random phenomena over the long term from the very complex,

More information

A Few Basics of Probability

A Few Basics of Probability A Few Basics of Probability Philosophy 57 Spring, 2004 1 Introduction This handout distinguishes between inductive and deductive logic, and then introduces probability, a concept essential to the study

More information

Elements of probability theory

Elements of probability theory 2 Elements of probability theory Probability theory provides mathematical models for random phenomena, that is, phenomena which under repeated observations yield di erent outcomes that cannot be predicted

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION 1 WHAT IS STATISTICS? Statistics is a science of collecting data, organizing and describing it and drawing conclusions from it. That is, statistics

More information

ST 371 (IV): Discrete Random Variables

ST 371 (IV): Discrete Random Variables ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible

More information

Solution to Homework 2

Solution to Homework 2 Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if

More information

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k. REPEATED TRIALS Suppose you toss a fair coin one time. Let E be the event that the coin lands heads. We know from basic counting that p(e) = 1 since n(e) = 1 and 2 n(s) = 2. Now suppose we play a game

More information

WHERE DOES THE 10% CONDITION COME FROM?

WHERE DOES THE 10% CONDITION COME FROM? 1 WHERE DOES THE 10% CONDITION COME FROM? The text has mentioned The 10% Condition (at least) twice so far: p. 407 Bernoulli trials must be independent. If that assumption is violated, it is still okay

More information

Systems of Linear Equations

Systems of Linear Equations Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and

More information

1.2 Solving a System of Linear Equations

1.2 Solving a System of Linear Equations 1.. SOLVING A SYSTEM OF LINEAR EQUATIONS 1. Solving a System of Linear Equations 1..1 Simple Systems - Basic De nitions As noticed above, the general form of a linear system of m equations in n variables

More information

Chapter 3. Distribution Problems. 3.1 The idea of a distribution. 3.1.1 The twenty-fold way

Chapter 3. Distribution Problems. 3.1 The idea of a distribution. 3.1.1 The twenty-fold way Chapter 3 Distribution Problems 3.1 The idea of a distribution Many of the problems we solved in Chapter 1 may be thought of as problems of distributing objects (such as pieces of fruit or ping-pong balls)

More information

Math 4310 Handout - Quotient Vector Spaces

Math 4310 Handout - Quotient Vector Spaces Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable

More information

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem Time on my hands: Coin tosses. Problem Formulation: Suppose that I have

More information

Probability and statistics; Rehearsal for pattern recognition

Probability and statistics; Rehearsal for pattern recognition Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception

More information

Definition and Calculus of Probability

Definition and Calculus of Probability In experiments with multivariate outcome variable, knowledge of the value of one variable may help predict another. For now, the word prediction will mean update the probabilities of events regarding the

More information

Chapter 11 Number Theory

Chapter 11 Number Theory Chapter 11 Number Theory Number theory is one of the oldest branches of mathematics. For many years people who studied number theory delighted in its pure nature because there were few practical applications

More information

A natural introduction to probability theory. Ronald Meester

A natural introduction to probability theory. Ronald Meester A natural introduction to probability theory Ronald Meester ii Contents Preface v 1 Experiments 1 1.1 Definitions and examples........................ 1 1.2 Counting and combinatorics......................

More information

Gambling Systems and Multiplication-Invariant Measures

Gambling Systems and Multiplication-Invariant Measures Gambling Systems and Multiplication-Invariant Measures by Jeffrey S. Rosenthal* and Peter O. Schwartz** (May 28, 997.. Introduction. This short paper describes a surprising connection between two previously

More information

Normal distribution. ) 2 /2σ. 2π σ

Normal distribution. ) 2 /2σ. 2π σ Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

More information

CS 3719 (Theory of Computation and Algorithms) Lecture 4

CS 3719 (Theory of Computation and Algorithms) Lecture 4 CS 3719 (Theory of Computation and Algorithms) Lecture 4 Antonina Kolokolova January 18, 2012 1 Undecidable languages 1.1 Church-Turing thesis Let s recap how it all started. In 1990, Hilbert stated a

More information

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

More information

Lab 11. Simulations. The Concept

Lab 11. Simulations. The Concept Lab 11 Simulations In this lab you ll learn how to create simulations to provide approximate answers to probability questions. We ll make use of a particular kind of structure, called a box model, that

More information

3. Mathematical Induction

3. Mathematical Induction 3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)

More information

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

1 if 1 x 0 1 if 0 x 1

1 if 1 x 0 1 if 0 x 1 Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

More information

Pigeonhole Principle Solutions

Pigeonhole Principle Solutions Pigeonhole Principle Solutions 1. Show that if we take n + 1 numbers from the set {1, 2,..., 2n}, then some pair of numbers will have no factors in common. Solution: Note that consecutive numbers (such

More information

Lecture Note 1 Set and Probability Theory. MIT 14.30 Spring 2006 Herman Bennett

Lecture Note 1 Set and Probability Theory. MIT 14.30 Spring 2006 Herman Bennett Lecture Note 1 Set and Probability Theory MIT 14.30 Spring 2006 Herman Bennett 1 Set Theory 1.1 Definitions and Theorems 1. Experiment: any action or process whose outcome is subject to uncertainty. 2.

More information

Math 202-0 Quizzes Winter 2009

Math 202-0 Quizzes Winter 2009 Quiz : Basic Probability Ten Scrabble tiles are placed in a bag Four of the tiles have the letter printed on them, and there are two tiles each with the letters B, C and D on them (a) Suppose one tile

More information

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 Introduction to Discrete Probability Probability theory has its origins in gambling analyzing card games, dice,

More information

. 0 1 10 2 100 11 1000 3 20 1 2 3 4 5 6 7 8 9

. 0 1 10 2 100 11 1000 3 20 1 2 3 4 5 6 7 8 9 Introduction The purpose of this note is to find and study a method for determining and counting all the positive integer divisors of a positive integer Let N be a given positive integer We say d is a

More information

Probability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur.

Probability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur. Probability Probability Simple experiment Sample space Sample point, or elementary event Event, or event class Mutually exclusive outcomes Independent events a number between 0 and 1 that indicates how

More information

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8. Random variables Remark on Notations 1. When X is a number chosen uniformly from a data set, What I call P(X = k) is called Freq[k, X] in the courseware. 2. When X is a random variable, what I call F ()

More information

9.2 Summation Notation

9.2 Summation Notation 9. Summation Notation 66 9. Summation Notation In the previous section, we introduced sequences and now we shall present notation and theorems concerning the sum of terms of a sequence. We begin with a

More information

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let Copyright c 2009 by Karl Sigman 1 Stopping Times 1.1 Stopping Times: Definition Given a stochastic process X = {X n : n 0}, a random time τ is a discrete random variable on the same probability space as

More information

Sums of Independent Random Variables

Sums of Independent Random Variables Chapter 7 Sums of Independent Random Variables 7.1 Sums of Discrete Random Variables In this chapter we turn to the important question of determining the distribution of a sum of independent random variables

More information

Math 55: Discrete Mathematics

Math 55: Discrete Mathematics Math 55: Discrete Mathematics UC Berkeley, Fall 2011 Homework # 7, due Wedneday, March 14 Happy Pi Day! (If any errors are spotted, please email them to morrison at math dot berkeley dot edu..5.10 A croissant

More information

9.4. The Scalar Product. Introduction. Prerequisites. Learning Style. Learning Outcomes

9.4. The Scalar Product. Introduction. Prerequisites. Learning Style. Learning Outcomes The Scalar Product 9.4 Introduction There are two kinds of multiplication involving vectors. The first is known as the scalar product or dot product. This is so-called because when the scalar product of

More information

Mathematical Induction

Mathematical Induction Mathematical Induction In logic, we often want to prove that every member of an infinite set has some feature. E.g., we would like to show: N 1 : is a number 1 : has the feature Φ ( x)(n 1 x! 1 x) How

More information

Bayesian Tutorial (Sheet Updated 20 March)

Bayesian Tutorial (Sheet Updated 20 March) Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that

More information

Just the Factors, Ma am

Just the Factors, Ma am 1 Introduction Just the Factors, Ma am The purpose of this note is to find and study a method for determining and counting all the positive integer divisors of a positive integer Let N be a given positive

More information

Linear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007)

Linear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007) MAT067 University of California, Davis Winter 2007 Linear Maps Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007) As we have discussed in the lecture on What is Linear Algebra? one of

More information

3.2 Matrix Multiplication

3.2 Matrix Multiplication 3.2 Matrix Multiplication Question : How do you multiply two matrices? Question 2: How do you interpret the entries in a product of two matrices? When you add or subtract two matrices, you add or subtract

More information

LEARNING OBJECTIVES FOR THIS CHAPTER

LEARNING OBJECTIVES FOR THIS CHAPTER CHAPTER 2 American mathematician Paul Halmos (1916 2006), who in 1942 published the first modern linear algebra book. The title of Halmos s book was the same as the title of this chapter. Finite-Dimensional

More information

Unit 4 The Bernoulli and Binomial Distributions

Unit 4 The Bernoulli and Binomial Distributions PubHlth 540 4. Bernoulli and Binomial Page 1 of 19 Unit 4 The Bernoulli and Binomial Distributions Topic 1. Review What is a Discrete Probability Distribution... 2. Statistical Expectation.. 3. The Population

More information

6.042/18.062J Mathematics for Computer Science. Expected Value I

6.042/18.062J Mathematics for Computer Science. Expected Value I 6.42/8.62J Mathematics for Computer Science Srini Devadas and Eric Lehman May 3, 25 Lecture otes Expected Value I The expectation or expected value of a random variable is a single number that tells you

More information

Ch. 13.2: Mathematical Expectation

Ch. 13.2: Mathematical Expectation Ch. 13.2: Mathematical Expectation Random Variables Very often, we are interested in sample spaces in which the outcomes are distinct real numbers. For example, in the experiment of rolling two dice, we

More information

Probability definitions

Probability definitions Probability definitions 1. Probability of an event = chance that the event will occur. 2. Experiment = any action or process that generates observations. In some contexts, we speak of a data-generating

More information

Binomial lattice model for stock prices

Binomial lattice model for stock prices Copyright c 2007 by Karl Sigman Binomial lattice model for stock prices Here we model the price of a stock in discrete time by a Markov chain of the recursive form S n+ S n Y n+, n 0, where the {Y i }

More information

3.2 Roulette and Markov Chains

3.2 Roulette and Markov Chains 238 CHAPTER 3. DISCRETE DYNAMICAL SYSTEMS WITH MANY VARIABLES 3.2 Roulette and Markov Chains In this section we will be discussing an application of systems of recursion equations called Markov Chains.

More information

Wald s Identity. by Jeffery Hein. Dartmouth College, Math 100

Wald s Identity. by Jeffery Hein. Dartmouth College, Math 100 Wald s Identity by Jeffery Hein Dartmouth College, Math 100 1. Introduction Given random variables X 1, X 2, X 3,... with common finite mean and a stopping rule τ which may depend upon the given sequence,

More information

Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Know the definitions of sample space, event and probability function. 2. Be able to

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Know the definitions of conditional probability and independence

More information

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89 by Joseph Collison Copyright 2000 by Joseph Collison All rights reserved Reproduction or translation of any part of this work beyond that permitted by Sections

More information

Lecture 17 : Equivalence and Order Relations DRAFT

Lecture 17 : Equivalence and Order Relations DRAFT CS/Math 240: Introduction to Discrete Mathematics 3/31/2011 Lecture 17 : Equivalence and Order Relations Instructor: Dieter van Melkebeek Scribe: Dalibor Zelený DRAFT Last lecture we introduced the notion

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

Probabilities. Probability of a event. From Random Variables to Events. From Random Variables to Events. Probability Theory I

Probabilities. Probability of a event. From Random Variables to Events. From Random Variables to Events. Probability Theory I Victor Adamchi Danny Sleator Great Theoretical Ideas In Computer Science Probability Theory I CS 5-25 Spring 200 Lecture Feb. 6, 200 Carnegie Mellon University We will consider chance experiments with

More information

Chapter 31 out of 37 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M.

Chapter 31 out of 37 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M. 31 Geometric Series Motivation (I hope) Geometric series are a basic artifact of algebra that everyone should know. 1 I am teaching them here because they come up remarkably often with Markov chains. The

More information

Section 6-5 Sample Spaces and Probability

Section 6-5 Sample Spaces and Probability 492 6 SEQUENCES, SERIES, AND PROBABILITY 52. How many committees of 4 people are possible from a group of 9 people if (A) There are no restrictions? (B) Both Juan and Mary must be on the committee? (C)

More information

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) CS 30, Winter 2016 by Prasad Jayanti 1. (10 points) Here is the famous Monty Hall Puzzle. Suppose you are on a game show, and you

More information

Regular Languages and Finite Automata

Regular Languages and Finite Automata Regular Languages and Finite Automata 1 Introduction Hing Leung Department of Computer Science New Mexico State University Sep 16, 2010 In 1943, McCulloch and Pitts [4] published a pioneering work on a

More information

Linear Algebra I. Ronald van Luijk, 2012

Linear Algebra I. Ronald van Luijk, 2012 Linear Algebra I Ronald van Luijk, 2012 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents 1. Vector spaces 3 1.1. Examples 3 1.2. Fields 4 1.3. The field of complex numbers. 6 1.4.

More information

4.5 Linear Dependence and Linear Independence

4.5 Linear Dependence and Linear Independence 4.5 Linear Dependence and Linear Independence 267 32. {v 1, v 2 }, where v 1, v 2 are collinear vectors in R 3. 33. Prove that if S and S are subsets of a vector space V such that S is a subset of S, then

More information

MATH10040 Chapter 2: Prime and relatively prime numbers

MATH10040 Chapter 2: Prime and relatively prime numbers MATH10040 Chapter 2: Prime and relatively prime numbers Recall the basic definition: 1. Prime numbers Definition 1.1. Recall that a positive integer is said to be prime if it has precisely two positive

More information

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

More information

In the situations that we will encounter, we may generally calculate the probability of an event

In the situations that we will encounter, we may generally calculate the probability of an event What does it mean for something to be random? An event is called random if the process which produces the outcome is sufficiently complicated that we are unable to predict the precise result and are instead

More information

Notes on Determinant

Notes on Determinant ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

More information

INTERSECTION MATH And more! James Tanton

INTERSECTION MATH And more! James Tanton INTERSECTION MATH And more! James Tanton www.jamestanton.com The following represents a sample activity based on the December 2006 newsletter of the St. Mark s Institute of Mathematics (www.stmarksschool.org/math).

More information

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system 1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables

More information

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES I GROUPS: BASIC DEFINITIONS AND EXAMPLES Definition 1: An operation on a set G is a function : G G G Definition 2: A group is a set G which is equipped with an operation and a special element e G, called

More information

6 Scalar, Stochastic, Discrete Dynamic Systems

6 Scalar, Stochastic, Discrete Dynamic Systems 47 6 Scalar, Stochastic, Discrete Dynamic Systems Consider modeling a population of sand-hill cranes in year n by the first-order, deterministic recurrence equation y(n + 1) = Ry(n) where R = 1 + r = 1

More information

Sudoku puzzles and how to solve them

Sudoku puzzles and how to solve them Sudoku puzzles and how to solve them Andries E. Brouwer 2006-05-31 1 Sudoku Figure 1: Two puzzles the second one is difficult A Sudoku puzzle (of classical type ) consists of a 9-by-9 matrix partitioned

More information

DERIVATIVES AS MATRICES; CHAIN RULE

DERIVATIVES AS MATRICES; CHAIN RULE DERIVATIVES AS MATRICES; CHAIN RULE 1. Derivatives of Real-valued Functions Let s first consider functions f : R 2 R. Recall that if the partial derivatives of f exist at the point (x 0, y 0 ), then we

More information

Ch. 13.3: More about Probability

Ch. 13.3: More about Probability Ch. 13.3: More about Probability Complementary Probabilities Given any event, E, of some sample space, U, of a random experiment, we can always talk about the complement, E, of that event: this is the

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

STA 371G: Statistics and Modeling

STA 371G: Statistics and Modeling STA 371G: Statistics and Modeling Decision Making Under Uncertainty: Probability, Betting Odds and Bayes Theorem Mingyuan Zhou McCombs School of Business The University of Texas at Austin http://mingyuanzhou.github.io/sta371g

More information

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors Chapter 9. General Matrices An n m matrix is an array a a a m a a a m... = [a ij]. a n a n a nm The matrix A has n row vectors and m column vectors row i (A) = [a i, a i,..., a im ] R m a j a j a nj col

More information