NOTES ON ELEMENTARY PROBABILITY
|
|
- Gabriella Holt
- 7 years ago
- Views:
Transcription
1 NOTES ON ELEMENTARY PROBABILITY KARL PETERSEN 1. Probability spaces Probability theory is an attempt to work mathematically with the relative uncertainties of random events. In order to get started, we do not attempt to estimate the probability of occurrence of any event but instead assume that somehow these have already been arrived at and so are given to us in advance. These data are assembled in the form of a probability space (X, B,P), which consists of (1) a set X, sometimes called the sample space, which is thought of as the set of all possible states of some system, or as the set of all possible outcomes of some experiment; (2) a family B of subsets of X, which is thought of as the family of observable events; and (3) a function P : B [0, 1], which for each observable event E B gives the probability P(E) of occurrence of that event. While the set X of all possible outcomes is an arbitrary set, for several reasons, which we will not discuss at this moment, the set B of observable events is not automatically assumed to consist of all subsets of X. (But if X is a finite set, then usually we do take B to be the family of all subsets of X.) We also assume that the family B of observable events and the probability measure P satisfy a minimal list of properties which permit calculations of probabilities of combinations of events: (1) P(X) = 1 (2) B contains X and is is closed under the set-theoretic operations of union, intersection, and complementation: if E,F B, then E F B,E F B, and E c = X \ E B. (Recall that E F is the set of all elements of X that are either in E or in F, E F is the set of all elements of X that are in both E and F, and E c is the set of all elements of X that are not in E.) 1
2 2 KARL PETERSEN In fact, in order to permit even more calculations (but not too many) we suppose that also the union and intersection of countably many members of B are still in B. (3) If E,F B are disjoint, so that E F =, then P(E F) = P(E) + P(F). In fact, we assume that P is countably additive: if E 1,E 2,... are pairwise disjoint (so that E i E j = if i j), then (1) P( i=1e i ) = P(E 1 E 2...) = P(E 1 )+P(E 2 )+ = P(E i ). Example 1.1. In some simple but still interesting and useful cases, X is a finite set such as {0,...,d 1} and B consists of all subsets of X. Then P is determined by specifying the value p i = P(i) of each individual point i of X. For example, the single flip of a fair coin is modeled by letting X = {0, 1}, with 0 representing the outcome heads and 1 the outcome tails, and defining P(0) = P(1) = 1/2. Note that the probabilities of all subsets of X are then determined (in the case of the single coin flip, P(X) = 1 and P( ) = 0). Exercise 1.1. Set up the natural probability space that describes the roll of a single fair die and find the probability that the outcome of any roll is a number greater than 2. Exercise 1.2. When a pair of fair dice is rolled, what is the probability that the sum of the two numbers shown (on the upward faces) is even? Exercise 1.3. In a certain lottery one gets to try to match (after paying an entry fee) a set of 6 different numbers that have been previously chosen from {1,...,30}. What is the probability of winning? Exercise 1.4. What is the probability that a number selected at random from {1,...,100} is divisible by both 3 and 7? Exercise 1.5. A fair coin is flipped 10 times. What is the probability that heads comes up twice in a row? Exercise 1.6. Ten fair coins are dropped on the floor. What is the probability that at least two of them show heads? Exercise 1.7. A fair coin is flipped ten times. What is the probability that heads comes up at least twice? Exercise 1.8. Show that if E and F are observable events in any probability space, then (2) P(E F) = P(E) + P(F) P(E F). i=1
3 NOTES ON ELEMENTARY PROBABILITY 3 2. Conditional probability Let (X, B,P) be a probability space and let Y B with P(Y ) > 0. We can restrict our attention to Y, making it the set of possible states or outcomes for a probability space as follows: (1) The set of states is Y X with P(Y ) > 0; (2) The family of observable events is defined to be (3) B Y = {E Y : E B}; (3) The probability measure P Y is defined on B Y by (4) P Y (A) = P(A) P(Y ) for all A B Y. Forming the probability space (Y, B Y,P Y ) is called conditioning on Y. It models the revision of probability assignments when the event Y is known to have occurred: we think of P Y (A) as the probability that A occurred, given that we already know that Y occurred. Example 2.1. When a fair die is rolled, the probability of an even number coming up is 1/2. What is the probability that an even number came up if we are told that the number showing is greater than 3? Then out of the three possible outcomes in Y = {4, 5, 6}, two are even, so the answer is 2/3. Definition 2.1. For any (observable) Y X with P(Y ) > 0 and any (observable) E X we define the conditional probability of E given Y to be (5) P(E Y ) = P(E Y ) = P Y (E). P(Y ) Exercise 2.1. A fair coin is flipped three times. What is the probability of at least one head? Given that the first flip was tails, what is the probability of at least one head? Exercise 2.2. From a group of two men and three women a set of three representatives is to be chosen. Each member is equally likely to be selected. Given that the set includes at least one member of each sex, what is the probability that there are more men than women in it? Definition 2.2. The observable events A and B in a probability space (X, B,P) are said to be independent in case (6) P(A B) = P(A)P(B).
4 4 KARL PETERSEN Notice that in case one of the events has positive probability, say P(B) > 0, then A and B are independent if and only if (7) P(A B) = P(A); that is, knowing that B has occurred does not change the probability that A has occurred. Example 2.2. A fair coin is flipped twice. What is the probability that heads occurs on the second flip, given that it occurs on the first flip? We model the two flips of the coin by bit strings of length two, writing 0 for heads and 1 for tails on each of the two flips. If Y is the set of outcomes which have heads on the first flip, and A is the set that has heads on the second flip, then X = {00, 01, 10, 11}, Y = {00, 01}, A = {10, 00}, and so that A Y = {00} includes exactly one of the two elements of Y. Since each of the four outcomes in X is equally likely, P(A Y ) = P(A Y ) P(Y ) = A Y Y Thus we see that A and Y are independent. = 1 2 = P(A). This example indicates that the definition of independence in probability theory reflects our intuitive notion of events whose occurrences do not influence one another. If repeated flips of a fair coin are modeled by a probability space consisting of bit strings of length n, all being equally likely, then an event whose occurrence is determined by a certain range of coordinates is independent of any other event that is determined by a disjoint range of coordinates. Example 2.3. A fair coin is flipped four times. Let A be the event that we obtain a head on the second flip and B be the event that among the first, third, and fourth flips we obtain at least two heads. Then A and B are independent. Exercise 2.3. Show that the events A and B described in the preceding example will be independent whether or not the coin being flipped is fair.
5 NOTES ON ELEMENTARY PROBABILITY 5 Exercise 2.4. Show that events A and B described in the preceding example will be independent even if the probability of heads could be different on each flip. Exercise 2.5. When a pair of fair dice is rolled, is the probability of the sum of the numbers shown being even independent of it being greater than six? 3. Bayes Theorem Looking at the definition of conditional probability kind of backwards leads very easily to a simple formula that is highly useful in practice and has profound implications for the foundations of probability theory (frequentists, subjectivists, etc.). We use the notation from [1], in which C is an event, thought of as a cause, such as the presence of a disease, and I is another event, thought of as the existence of certain information. The formula can be interpreted as telling us how to revise our original estimate P(C) that the cause C is present if we are given the information I. Theorem 3.1 (Bayes Theorem). Let (X, B,P) be a probability space and let C,I B with P(I) > 0. Then (8) P(C I) = P(C) P(I C) P(I). Proof. We just use the definitions of the conditional probabilities: (9) P(C I) = and the fact that C I = I C. P(C I), P(I C) = P(I) P(I C) P(C) Example 3.1. We discuss the example in [1, p. 77] in this notation. C is the event that a patient has cancer, and P(C) is taken to be.01, the incidence of cancer in the general population for this example taken to be 1 in 100. I is the event that the patient tests positive on a certain test for this disease. The test is said to be 99% accurate, which we take to mean that the probability of error is less than.01, in the sense that P(I C c ) <.01 and P(I c C) <.01. Then P(I C) 1, and (10) P(I) = P(I C)P(C) + P(I C c )P(C c ).01 + (.01)(.99).02.
6 6 KARL PETERSEN Applying Bayes Theorem, (11) P(C I) = P(C) P(I C) P(I) (.01) The surprising conclusion is that even with such an apparently accurate test, if someone tests positive for this cancer there is only a 50% chance that he actually has the disease. Often Bayes Theorem is stated in a form in which there are several possible causes C 1,C 2,... which might lead to a result I with P(I) > 0. If we assume that the observable events C 1,C 2,... form a partition of the probability space X, so that they are pairwise disjoint and their union is all of X, then (12) P(I) = P(I C 1 )P(C 1 ) + P(I C 2 )P(C 2 ) +..., and Equation (8) says that for each i, P(I C i ) (13) P(C i I) = P(C i ) P(I C 1 )P(C 1 ) + P(I C 2 )P(C 2 ) This formula applies for any finite number of observable events C i as well as for a countably infinite number of them. Exercise 3.1. Suppose we want to use a set of medical tests to look for the presence of one of two diseases. Denote by S the event that the test gives a positive result and by D i the event that a patient has disease i = 1, 2. Suppose we know the incidences of the two diseases in the population: (14) P(D 1 ) =.07, P(D 2 ) =.05, P(D 1 D 2 ) =.01. From studies of many patients over the years it has also been learned that (15) P(S D 1 ) =.9, P(S D 2 ) =.8, P(S (D 1 D 2 ) c ) =.05, P(S D 1 D 2 ) =.99. (a) Form a partition of the underlying probability space X that will help to analyze this situation. (b) Find the probability that a patient has disease 1 if the battery of tests turns up positive. (c) Find the probability that a patient has disease 1 but not disease 2 if the battery of tests turns up positive.
7 NOTES ON ELEMENTARY PROBABILITY 7 4. Bernoulli trials In Section 2 we came across independent repeated trials of an experiment, such as flipping a coin or rolling a die. Such a sequence is conveniently represented by a probability space whose elements are strings on a finite alphabet. Equivalently, if a single run of the experiment is modeled by a probability space (D, B,P), then n independent repetitions of the experiment are modeled by the Cartesian product of D with itself n times, with the probability measure formed by a product of P with itself n times. We now state this more precisely. Let (D, B,P) be a probability space with D = {0,...,d 1}, B = the family of all subsets of D, P(i) = p i > 0 for i = 0,...,d 1. Denote by D (n) the Cartesian product of D with itself n times. Thus D (n) consists of all ordered n-tuples (x 1,...,x n ) with each x i D,i = 1,...,n. If we omit the commas and parentheses, we can think of each element of D (n) as a string of length n on the alphabet D. Example 4.1. If D = {0, 1} and n = 3, then D (n) = {000, 001, 010, 011, 100, 101, 110, 111}, the set of all bit strings of length 3. We now define the set of observables in D (n) to be B (n) = the family of all subsets of D (n). The probability measure P (n) on D (n) is determined by (16) P (n) (x 1 x 2...x n ) = P(x 1 )P(x 2 ) P(x n ) for each x 1 x 2...x n D (n). This definition of P (n) in terms of products of probabilities seen in the different coordinates (or entries) of a string guarantees the independence of two events that are determined by disjoint ranges of coordinates. Note that this holds true even if the strings of length n are not all equally likely. Exercise 4.1. A coin whose probability of heads is p, with 0 < p < 1/2, is flipped three times. Write out the probabilities of all the possible outcomes. If A is the event that the second flip produces heads, and B is the event that either the first or third flip produces tails, find P (3) (A B) and P (3) (A)P (3) (B).
8 8 KARL PETERSEN Let D = {0, 1} and P(0) = p (0, 1),P(1) = 1 p. Construct as above the probability space (D (n), B (n),p (n) ) representing n independent repetitions of the experiment (D, B, P). The binomial distribution gives the probability for each k = 0, 1,...,n of the set of strings of length n that contain exactly k 0 s. recall that C(n,k) denotes the binomial coefficient n!/(k!(n k)!), the number of k-element subsets of a set with n elements. Proposition 4.1. Let (D (n), B (n),p (n) ) be as described above. Then for each k = 0, 1,...,n, (17) P (n) {x 1...x n D (n) : x i = 0 for k choices of i = 1,...,n} = C(n,k)p k (1 p) n k. Proof. For each subset S of {1,...,n}, let E(S) = {x D (n) : x i = 0 if and only if i S}. Note that if S 1 and S 2 are different subsets of {1,...,n}, then E(S 1 ) and E(S 2 ) are disjoint. Fix k = 0, 1,...,n. There are C(n,k) subsets of {1,...,n} which have exactly k elements, and for each such subset S we have P (n) (E(S)) = p k (1 p) n k. Adding up the probabilities of these disjoint sets gives the result. Exercise 4.2. For the situation in Exercise 4.1 and each k = 0, 1, 2, 3, list the elements of A k = the event that exactly k heads occur. Also calculate the probability of each A k. Representing repetitions of an experiment with finitely many possible outcomes by strings on a finite alphabet draws an obvious connection with the modeling of information transfer or acquisition. A single experiment can be viewed as reading a single symbol, which is thought of as the outcome of the experiment. We can imagine strings (or experimental runs) of arbitrary lengths, and in fact even of infinite length. For example, we can consider the space of one-sided infinite bit strings (18) Ω + = {x 0 x 1 x 2 : each x i = 0 or 1}, as well as the space of two-sided infinite bit strings (19) Ω = {...x 1 x 0 x 1 : each x i = 0 or 1}.
9 NOTES ON ELEMENTARY PROBABILITY 9 Given p with 0 < p < 1, we can again define a probability measure for many events in either of these spaces: for example, (20) P p ( ) {x : x 2 = 0,x 6 = 1,x 7 = 1} = p(1 p)(1 p). A set such as the one above, determined by specifying the entries in a finite number of places in a string, is called a cylinder set. Let us define the probability of each cylinder set in accord with the idea that 0 s and 1 s are coming independently, with probabilities p and 1 p, respectively. Thus, if 0 i 1 < i 2 < < i r, each a 1,...,a r = 0 or 1, and s of the a j s are 0, let (21) P ( ) p {x Ω + : x i1 = a 1,...,x ir = a r } = p s (1 p) r s. It takes some effort (which we will not expend at this moment) to see that this definition does not lead to any contradictions, and that there is a unique extension of P p ( ) so as to be defined on a family B ( ) which contains all the cylinder sets and is closed under complementation, countable unions, and countable intersections. Definition 4.1. If D is an arbitrary finite set, we denote by Ω + (D) the set of one-sided infinite strings x 0 x 1 x 2... with entries from the alphabet D, and we denote by Ω(D) the set of two-sided infinite strings with entries from D. We abbreviate Ω + = Ω + ({0, 1}) and Ω = Ω({0, 1}). With each of these sequence spaces we deal always with a fixed family B of observable events which contains the cylinder sets and is closed under countable unions, countable intersections, and complementation. The spaces Ω + (D) and Ω ( D) are useful models of information sources, especially when combined with a family of observables B which contains all cylinder sets and with a probability measure P defined on B. (We are dropping the extra superscripts on B and P in order to simplify the notation.) Given a string a = a 0...a r 1 on the symbols of the alphabet D and a time n 0, the probability that the source emits the string at time n is given by the probability of the cylinder set {x : x n = a 0,x 1 = a 1,...,x n+r 1 = a r 1 }. Requiring that countable unions and intersections of observable events be observable allows us to consider quite interesting and complicated events, including various combinations of infinite sequences of events. Example 4.2. In the space Ω + constructed above, with the probability measure P p ( ), let us see that the set of (one-sided) infinite strings which contain infinitely many 0 s has probability 1. For this purpose
10 10 KARL PETERSEN we assume (as can be proved rigorously) that the probability space (Ω +, B ( ( ) ),P p ) does indeed satisfy the properties set out axiomatically at the beginning of these notes. Let A = {x Ω + : x i = 0 for infinitely many i}. We aim to show that P ( ) (A c ) = 0 (A c = Ω + \ A = the complement of A), and hence P ( ) (A) = 1. For each n = 0, 1, 2,... let B n = {x Ω + : x n = 0 but x i = 1 for all i > n}, and let B 1 consist of the single string Then the sets B n are pairwise disjoint and their union is A c. By countable additivity, P ( ) p ( n= 1 so it is enough to show that and B n ) = n= 1 P ( ) p (B n ), P ( ) p (B n ) = 0 for all n. Fix any n = 1, 0, 1, 2,.... For each r = 1, 2,..., B n Z n+1,n+r = {x Ω + : x n+1 = x n+2 = = x n+r = 1}, P ( ) p (Z n+1,n+r ) = (1 p) r. Since 0 < 1 p < 1, we have (1 p) r 0 as r, so P ( ) p (B n ) = 0 for each n. If A is an observable event in any probability space which has probability 1, then we say that A occurs almost surely, or with probability 1. If some property holds for all points x D in a set of probability 1, then we say that the property holds almost everywhere. Exercise 4.3. In the probability space (Ω +, B ( ),P ( ) p ) constructed above, find the probability of the set of infinite strings of 0 s and 1 s which never have two 1 s in a row. (Hint: For each n = 0, 1, 2,... consider B n = {x Ω + : x 2n x 2n+1 11}.)
11 NOTES ON ELEMENTARY PROBABILITY Markov chains Symbols in strings or outcomes of repeated experiments are not always completely independent of one another frequently there are relations, interactions, or dependencies among the entries in various coordinates. In English text, the probabilities of letters depend heavily on letters near them: h is much more likely to follow t than to follow f. Some phenomena can show very long-range order, even infinite memory. Markov chains model processes with only short-range memory, in which the probability of what symbol comes next depends only on a fixed number of the immediately preceding symbols. In the simplest case, 1-step Markov chains, the probability of what comes next depends only on the immediately preceding symbol. The outcome of any repetition of the experiment depends only on the outcome of the immediately preceding one and not on any before that. The precise definition of a Markov chain on a finite state space, or alphabet, D = {0, 1,...,d 1} is as follows. The sample space is the set Σ + of all one-sided (could be also two-sided) infinite sequences x = x 0 x 1... with entries from the alphabet D. The family of observable events again contains all the cylinder sets. The probability measure M is determined by two pieces of data: (1) a probability vector p = (p 0,...,p d 1 ), with each p i 0 and p p d 1 = 1, giving the initial distribution for the chain; (2) a matrix P = (P ij ) giving the transition probabilities between each pair of states i,j D. It is assumed that each P ij 0 and that for each i we have P i1 + P i2 + + P i,d 1 = 1. Such a P is called a stochastic matrix. Now the probability of each basic cylinder set determined by fixing the first n entries at values a 0,...,a n 1 D is defined to be (22) M{x Σ + : x 0 = a 0,...,x n 1 = a n 1 } = p a0 P a0 a 1 P a1 a 2...P an 2,n 1. The idea here is simple. The initial symbol of a string, at coordinate 0, is selected with probability determined by the initial distribution p: symbol i has probability p i of appearing, for each i = 0, 1,...,d 1. Then given that symbol, a 0, the probability of transitioning to any other symbol is determined by the entries in the matrix P, specifically the entries in row a 0 : the probability that a 1 comes next, given that we just saw a 0 is P a0 a 1. And so on. The condition that the matrix P have
12 12 KARL PETERSEN row sums 1 tells us that we are sure to be able to add some symbol each time. The 1-step memory property can be expressed as follows. For any choice of symbols a 0,...,a n, M{x Σ + : x n = a n x 0 = a 0,...,x n 1 = a n 1 } = M{x Σ + : x n = a n x n 1 = a n 1 }. Finite-state Markov chains are conveniently visualized in terms of random paths on directed graphs /2 0 1/4 1/4 1/2 2 1/2 Here the states are 0, 1, 2 and the transition probabilities between states are the labels on the arrows. Thus the stochastic transition matrix is P = /2 1/4 1/4. 1/2 0 1/2 If we specified an initial distribution p = (1/6, 1/2, 1/3) listing the initial probabilities of the states 0, 1, 2, respectively, then the probabilities of strings starting at the initial coordinate would be calculated as in this example: M{x Σ + : x 0 = 1,x 2 = 1,x 3 = 0} = p 1 P 11 P 10 = = Exercise 5.1. For the example above, with p and P as given, find the probabilities of all the positive-probability strings of length 3. Recall that the vector p = (p 0,...,p d 1 ) gives the initial distribution: the probability that at time 0 the system is in state j {0,...,d 1} is p j. So what is the probability that the system is in state j at time 1? Well, the event that the system is in state j at time 1, namely {x Σ + : x 1 = j}, is the union of d disjoint sets defined by the different
13 possible values of x 0 : NOTES ON ELEMENTARY PROBABILITY 13 d 1 (23) {x Σ + : x 1 = j} = {x Σ + : x 0 = i,x 1 = j}. Since the i th one of these sets has probability p i P ij, we have i=0 d 1 (24) M{x Σ + : x 1 = j} = p i P ij. So we have determined the distribution p (1) of the chain at time 1. The equations d 1 (25) p (1) j = p i P ij for j = 0,...,d 1 i=0 are abbreviated, using multiplication of vectors by matrices, by (26) p (1) = pp. i=0 Similarly, the distribution at time 2 is given by (27) p (2) = p (1) P = pp 2, where P 2 is the square of the matrix P according to matrix multiplication. And so on: the probability that at any time n = 0, 1, 2,... the chain is in state j = 0,...,d 1 is (pp n ) j, namely, the j th entry of the vector obtained by multiplying the initial distribution vector p on the right n times by the stochastic transition matrix P. Here s a quick definition of matrix multiplication. Suppose that A is a matrix with m rows and n columns (m,n 1; if either equals 1, A is a (row or column) vector). Suppose that B is a matrix with n rows and p columns. Then AB is defined as a matrix with m rows and p columns. The entry in the i th row and j th column of the product AB is formed by using the i th row of A and the j th column of B: take the sum of the products of the entries in the i th row of A (there are n of them) with the entries in the j th column of B (there are also n of these) this is the dot product or scalar product of the i th row of A with the j th column of B: n (28) (AB) ij = A ik B kj, for i = 1,...,m;j = 1,...,p. k=1 Note that here we have numbered entries starting with 1 rather than with 0. (This is how Matlab usually does it).
14 14 KARL PETERSEN Markov chains have many applications in physics, biology, psychology (learning theory), and even sociology. Here is a nonrealistic indication of possible applications. Exercise 5.2. Suppose that a certain study divides women into three groups according to their level of education: completed college, completed high school but not college, or did not complete high school. Suppose that data are accumulated showing that the daughter of a college-educated mother has a probability.7 of also completing college, probability.2 of only making it through high school, and probability.1 of not finishing high school; the daughter of a mother who only finished high school has probabilities.5,.3, and.2, respectively, of finishing college, high school only, or neither; and the daughter of a mother who did not finish high school has corresponding probabilities.3,.4, and.3. (a) We start with a population in which 30% of women finished college, 50% finished high school but not college, and 20% did not finish high school. What is the probability that a granddaughter of one of these women who never finished high school will make it through college? (b) Suppose that the initial distribution among the different groups is (.5857,.2571,.1571). What will be the distribution in the next generation? The one after that? The one after that? Remark 5.1. Under some not too stringent hypotheses, the powers P k of the stochastic transition matrix P of a Markov chain will converge to a matrix Q all of whose rows are equal to the same vector q, which then satisfies qq = q and is called the stable distribution for the Markov chain. You can try this out easily on Matlab by starting with various stochastic matrices P and squaring repeatedly. 6. Space mean and time mean Definition 6.1. A random variable on a probability space (X, B,P) is a function f : X R such that for each interval (a,b) of real numbers, the event {x X : f(x) (a, b)} is an observable event. More briefly, (29) f 1 (a,b) B for all a,b R. This definition seeks to capture the idea of making measurements on a random system, without getting tangled in talk about numbers fluctuating in unpredictable ways.
15 NOTES ON ELEMENTARY PROBABILITY 15 Example 6.1. In an experiment of rolling two dice, a natural sample space is X = {(i,j) : i,j = 1,...,6}. We take B = the family of all subsets of X and assume that all 36 outcomes are equally likely. One important random variable on this probability space is the sum of the numbers rolled: s(i,j) = i + j for all (i,j) X. Example 6.2. If X is the set of bit strings of length 7, B = all subsets of X, and all strings are equally likely, we could consider the random variable s(x) = x x 6 = number of 1 s in x. In the following definitions let (X, B,P) be a probability space. Definition 6.2. A partition of X is a family {A 1,...,A n } of observable subsets of X (each A i B) which are pairwise disjoint and whose union is X. The sets A i are called the cells of the partition. Definition 6.3. A simple random variable on X is a random variable f : X R for which there is a partition {A 1,...,A n } of X such that f is constant on each cell A i of the partition: there are c 1,...,c n R such that f(x) = c i for all x A i,i = 1,...,n. Definition 6.4. Let f be a simple random variable as in Definition 6.3. We define the space mean, or expected value, or expectation of f to be n (30) E(f) = c i P(A i ). i=1 Example 6.3. Let the probability space and random variable f be as in Example 6.1 the sum of the numbers showing. To compute the expected value of f = s, we partition the set of outcomes according to the value of the sum: let A j = s 1 (j),j = 2,...,12. Then we figure out the probability of each cell of the partition. Since all outcomes are assumed to be equally likely, the probability that s(x) = i is the number of outcomes x that produce sum i, times the probability (1/36) of each outcome. Now the numbers of ways to roll 2, 3,...,12, respectively, are seen by inspection to be 1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1. Multiply each value (2 through 12) of the random variable s by the probability that it takes that value (1/36, 2/36,...,1/36) and add these up to get E(s) = 7. Thus 7 is the expected sum on a roll of a pair of dice. This is the mean or average sum. The expected value is not always the same as
16 16 KARL PETERSEN the most probable value (if there is one) called the mode as the next example shows. Exercise 6.1. Find the expected value of the random variable in Example 6.2. Exercise 6.2. Suppose that the bit strings of length 7 in Example 6.2 are no longer equally likely but instead are given by the probability measure P (7) on {0, 1} (7) determined by P(0) = 1/3,P(1) = 2/3. Now what is the expected value of the number of 1 s in a string chosen at random? The expectation of a random variable f is its average value over the probability space X, taking into account that f may take values in some intervals with greater probability than in others. If the probability space modeled a game in which an entrant received a payoff of f(x) dollars in case the random outcome were x X, the expectation E(f) would be considered a fair price to pay in order to play the game. (Gambling establishments charge a bit more than this, so that they will probably make a profit.) We consider now a situation in which we make not just a single measurement f on a probability space (X, B,P) but a sequence of measurements f 1,f 2,f 3,... A sequence of random variables is called a stochastic process. If the system is in state x X, then we obtain a sequence of numbers f 1 (x),f 2 (x),f 3 (x),..., and we think of f i (x) as the result of the observation that we make on the system at time i = 1, 2, 3,... It is natural to form the averages of these measurements: (31) A n {f i }(x) = 1 n f k (x) n is the average of the first n measurements. If we have an infinite sequence f 1,f 2,f 3,... of measurements, we can try to see whether these averages settle down around a limiting value 1 (32) A {f i }(x) = lim n n k=1 n f k (x). Such a limiting value may or may not exist quite possibly the sequence of measurements will be wild and the averages will not converge to any limit. k=1
17 NOTES ON ELEMENTARY PROBABILITY 17 We may look at the sequence of measurements and time averages in a different way: rather than imagining that we make a sequence of measurements on the system, we may imagine that we make the same measurement f on the system each time, but the system changes with time. This is the viewpoint of dynamical systems theory; in a sense the two viewpoints are equivalent. Example 6.4. Consider the system of Bernoulli trials (Ω +, B ( ),P ( ) p ) described above: the space consists of one-sided infinite sequences of 0 s and 1 s, the bits arriving independently with P(0) = p and P(1) = 1 p. We can read a sequence in two ways. (1) For each i = 0, 1, 2,..., let f i (x) = x i. We make a different measurement at each instant, always reading off the bit that is one place more to the right than the previously viewed one. (2) Define the shift transformation σ : Ω + Ω + by σ(x 0 x 1 x 2...) = x 1 x 2... This transformation lops off the first entry in each infinite bit string and shifts the remaining ones one place to the left. For each i = 1, 2,..., σ i denotes the composition of σ with itself i times; thus σ 2 lops off the first two places while shifting the sequence two places to the left. On the set Ω of two-sided infinite sequences we can shift in both directions, so we can consider σ i for i Z. Now let f(x) = x 0 for each x Ω +. Then the previous f i (x) = f(σ i x) for all i = 0, 1, 2,... In this realization, we just sit in one place, always observing the first entry in the bit string x as the string streams by toward the left. This seems to be maybe a more relaxed way to make measurements. Besides that, the dynamical viewpoint has many other advantages. For example, many properties of the stochastic processes {f(σ i x)}, can be deduced from study of the action of σ on (Ω +, B ( ),P p ( ) ) alone, independently of the particular choice of f. Exercise 6.3. In the example (Ω +, B ( ),P p ( ) ) just discussed, with f(x) = x 0 as above, do you think that the time average 1 A f(x) = lim n n n f(σ k x) will exist? (For all x? For most x?) If it were to exist usually, what should it be? k=1
18 18 KARL PETERSEN Exercise 6.4. Same as the preceding exercise, but with f replaced by { 1 if x 0 x 1 = 01 f(x) = 0 otherwise. 7. Stationary and ergodic information sources We have already defined an information source. It consists of the set of one- or two-sided infinite strings Ω + (D) or Ω(D) with entries from a finite alphabet D; a family B of subsets of the set of strings which contains all the cylinder sets and is closed under complementation and countable unions and intersections; and a probability measure P defined for all sets in B. (For simplicity we continue to delete superscripts on B and P.) We also have the shift transformation, defined on each of Ω + (D) and Ω(D) by (σx) i = x i+1 for all indices i. If f(x) = x 0, then observing f(σ k x) for k = 0, 1, 2,... reads the sequence x = x 0 x 1 x 2... as σ makes time go by. Definition 7.1. An information source as above is called stationary if the probability measure P is shift-invariant: given any word a = a 0 a 2...a r 1 and any two indices n and m in the allowable range of indices (Z for Ω(D), {0, 1, 2,...} for Ω + (D)), (33) P {x : x n = a 0,x n+1 = a 1,...,x n+r 1 = a r 1 } = P {x : x m = a 0,x m+1 = a 1,...,x m+r 1 = a r 1 }. The idea is that a stationary source emits its symbols, and in fact consecutive strings of symbols, according to a probability measure that does not change with time. The probability of seeing a string such as 001 is the same at time 3 as it is at time Such a source can be thought of as being in an equilibrium state whatever mechanisms are driving it (which are probably random in some way) are not having their basic principles changing with time. Example 7.1. The Bernoulli sources discussed above are stationary. This is clear from the definition of the probability of the cylinder set determined by any word as the product of the probabilities of the individual symbols in the word. Example 7.2. Consider a Markov source as above determined by an initial distribution p and a stochastic transition matrix P. If p is in fact a stable distribution for P (see Remark 5.1), pp = p,
19 NOTES ON ELEMENTARY PROBABILITY 19 then the Markov process, considered as an information source, is stationary. Definition 7.2. A stationary information source as above is called ergodic if for every simple random variable f on the set of sequences, the time mean of f almost surely equals the space mean of f. More precisely, the set of sequences x for which 1 (34) A f(x) = lim n n n f(σ k x) = E(f) (in the sense that the limit exists and equals E(f)) has probability 1. k=1 In fact, it can be shown that in order to check whether or not a source is ergodic, it is enough to check the definition for random variables f which are the characteristic functions of cylinder sets. Given a word a = a 0 a 1 a 2...a r 1, define { 1 if x 0 x 1...x r 1 = a (35) f a (x) = 0 otherwise. Ergodicity is then seen to be equivalent to requiring that in almost every sequence, every word appears with limiting frequency equal to the probability of any cylinder set defined by that word. Here almost every sequence means a set of sequences which has probability one. Example 7.3. The Bernoulli systems defined above are all ergodic. This is a strong version of Jakob Bernoulli s Law of Large Numbers (1713). What kinds of sources are not ergodic, you ask? It s easiest to give examples if one knows that ergodicity is equivalent to a kind of indecomposability of the probability space of sequences. Example 7.4. Let us consider an information source which puts out one-sided sequences on the alphabet D = {0, 1}. Let us suppose that the probability measure P governing the outputs is such that with probability 1/2 we get a constant string of 0 s, otherwise we get a string of 0 s and 1 s coming independently with equal probabilities. If we consider the simple random variable f 0, which gives a value of 1 if x 0 = 0 and otherwise gives a value 0, we see that on a set of probability 1/2 the time mean of f 0 is 1, while on another set of probability 1/2 it is 1/2 (assuming the result stated in Example 7.3). Thus, no matter the value of E(f 0 ), we cannot possibly have A f 0 = E(f 0 ) almost surely.
20 20 KARL PETERSEN Exercise 7.1. Calculate the space mean of the random variable f 0 in the preceding example. Exercise 7.2. Calculate the space mean and time mean of the random variable f 1 in the preceding example (see Formula (35)). References [1] Hans Christian von Baeyer, Information: The New Language of Science, Phoenix, London, 2004.
E3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
More information6.3 Conditional Probability and Independence
222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted
More informationMath/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability
Math/Stats 425 Introduction to Probability 1. Uncertainty and the axioms of probability Processes in the real world are random if outcomes cannot be predicted with certainty. Example: coin tossing, stock
More informationPeople have thought about, and defined, probability in different ways. important to note the consequences of the definition:
PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A COURSE ON MOLECULAR EVOLUTION (BIOL 3046) Probability The subject of PROBABILITY is a branch of mathematics dedicated to building models
More informationQuestion: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?
ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the
More informationBasic Probability Concepts
page 1 Chapter 1 Basic Probability Concepts 1.1 Sample and Event Spaces 1.1.1 Sample Space A probabilistic (or statistical) experiment has the following characteristics: (a) the set of all possible outcomes
More informationRandom variables, probability distributions, binomial random variable
Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that
More informationLecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University
Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University 1 Chapter 1 Probability 1.1 Basic Concepts In the study of statistics, we consider experiments
More informationMathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson
Mathematics for Computer Science/Software Engineering Notes for the course MSM1F3 Dr. R. A. Wilson October 1996 Chapter 1 Logic Lecture no. 1. We introduce the concept of a proposition, which is a statement
More informationChapter 3. Cartesian Products and Relations. 3.1 Cartesian Products
Chapter 3 Cartesian Products and Relations The material in this chapter is the first real encounter with abstraction. Relations are very general thing they are a special type of subset. After introducing
More informationChapter 4 Lecture Notes
Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,
More informationThe sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].
Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real
More informationFormal Languages and Automata Theory - Regular Expressions and Finite Automata -
Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March
More informationNOTES ON LINEAR TRANSFORMATIONS
NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete
More informationContinued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
More informationRow Echelon Form and Reduced Row Echelon Form
These notes closely follow the presentation of the material given in David C Lay s textbook Linear Algebra and its Applications (3rd edition) These notes are intended primarily for in-class presentation
More informationa 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationSolving Systems of Linear Equations
LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how
More informationUnit 19: Probability Models
Unit 19: Probability Models Summary of Video Probability is the language of uncertainty. Using statistics, we can better predict the outcomes of random phenomena over the long term from the very complex,
More informationA Few Basics of Probability
A Few Basics of Probability Philosophy 57 Spring, 2004 1 Introduction This handout distinguishes between inductive and deductive logic, and then introduces probability, a concept essential to the study
More informationElements of probability theory
2 Elements of probability theory Probability theory provides mathematical models for random phenomena, that is, phenomena which under repeated observations yield di erent outcomes that cannot be predicted
More informationMATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
More informationIAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION
IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION 1 WHAT IS STATISTICS? Statistics is a science of collecting data, organizing and describing it and drawing conclusions from it. That is, statistics
More informationST 371 (IV): Discrete Random Variables
ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible
More informationSolution to Homework 2
Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if
More informationREPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.
REPEATED TRIALS Suppose you toss a fair coin one time. Let E be the event that the coin lands heads. We know from basic counting that p(e) = 1 since n(e) = 1 and 2 n(s) = 2. Now suppose we play a game
More informationWHERE DOES THE 10% CONDITION COME FROM?
1 WHERE DOES THE 10% CONDITION COME FROM? The text has mentioned The 10% Condition (at least) twice so far: p. 407 Bernoulli trials must be independent. If that assumption is violated, it is still okay
More informationSystems of Linear Equations
Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and
More information1.2 Solving a System of Linear Equations
1.. SOLVING A SYSTEM OF LINEAR EQUATIONS 1. Solving a System of Linear Equations 1..1 Simple Systems - Basic De nitions As noticed above, the general form of a linear system of m equations in n variables
More informationChapter 3. Distribution Problems. 3.1 The idea of a distribution. 3.1.1 The twenty-fold way
Chapter 3 Distribution Problems 3.1 The idea of a distribution Many of the problems we solved in Chapter 1 may be thought of as problems of distributing objects (such as pieces of fruit or ping-pong balls)
More informationMath 4310 Handout - Quotient Vector Spaces
Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable
More informationIEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem
IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem Time on my hands: Coin tosses. Problem Formulation: Suppose that I have
More informationProbability and statistics; Rehearsal for pattern recognition
Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception
More informationDefinition and Calculus of Probability
In experiments with multivariate outcome variable, knowledge of the value of one variable may help predict another. For now, the word prediction will mean update the probabilities of events regarding the
More informationChapter 11 Number Theory
Chapter 11 Number Theory Number theory is one of the oldest branches of mathematics. For many years people who studied number theory delighted in its pure nature because there were few practical applications
More informationA natural introduction to probability theory. Ronald Meester
A natural introduction to probability theory Ronald Meester ii Contents Preface v 1 Experiments 1 1.1 Definitions and examples........................ 1 1.2 Counting and combinatorics......................
More informationGambling Systems and Multiplication-Invariant Measures
Gambling Systems and Multiplication-Invariant Measures by Jeffrey S. Rosenthal* and Peter O. Schwartz** (May 28, 997.. Introduction. This short paper describes a surprising connection between two previously
More informationNormal distribution. ) 2 /2σ. 2π σ
Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a
More informationCS 3719 (Theory of Computation and Algorithms) Lecture 4
CS 3719 (Theory of Computation and Algorithms) Lecture 4 Antonina Kolokolova January 18, 2012 1 Undecidable languages 1.1 Church-Turing thesis Let s recap how it all started. In 1990, Hilbert stated a
More informationLinear Algebra Notes for Marsden and Tromba Vector Calculus
Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of
More informationLab 11. Simulations. The Concept
Lab 11 Simulations In this lab you ll learn how to create simulations to provide approximate answers to probability questions. We ll make use of a particular kind of structure, called a box model, that
More information3. Mathematical Induction
3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)
More informationDecember 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More information1 if 1 x 0 1 if 0 x 1
Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or
More informationPigeonhole Principle Solutions
Pigeonhole Principle Solutions 1. Show that if we take n + 1 numbers from the set {1, 2,..., 2n}, then some pair of numbers will have no factors in common. Solution: Note that consecutive numbers (such
More informationLecture Note 1 Set and Probability Theory. MIT 14.30 Spring 2006 Herman Bennett
Lecture Note 1 Set and Probability Theory MIT 14.30 Spring 2006 Herman Bennett 1 Set Theory 1.1 Definitions and Theorems 1. Experiment: any action or process whose outcome is subject to uncertainty. 2.
More informationMath 202-0 Quizzes Winter 2009
Quiz : Basic Probability Ten Scrabble tiles are placed in a bag Four of the tiles have the letter printed on them, and there are two tiles each with the letters B, C and D on them (a) Suppose one tile
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 Introduction to Discrete Probability Probability theory has its origins in gambling analyzing card games, dice,
More information. 0 1 10 2 100 11 1000 3 20 1 2 3 4 5 6 7 8 9
Introduction The purpose of this note is to find and study a method for determining and counting all the positive integer divisors of a positive integer Let N be a given positive integer We say d is a
More informationProbability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur.
Probability Probability Simple experiment Sample space Sample point, or elementary event Event, or event class Mutually exclusive outcomes Independent events a number between 0 and 1 that indicates how
More informationRandom variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.
Random variables Remark on Notations 1. When X is a number chosen uniformly from a data set, What I call P(X = k) is called Freq[k, X] in the courseware. 2. When X is a random variable, what I call F ()
More information9.2 Summation Notation
9. Summation Notation 66 9. Summation Notation In the previous section, we introduced sequences and now we shall present notation and theorems concerning the sum of terms of a sequence. We begin with a
More information1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let
Copyright c 2009 by Karl Sigman 1 Stopping Times 1.1 Stopping Times: Definition Given a stochastic process X = {X n : n 0}, a random time τ is a discrete random variable on the same probability space as
More informationSums of Independent Random Variables
Chapter 7 Sums of Independent Random Variables 7.1 Sums of Discrete Random Variables In this chapter we turn to the important question of determining the distribution of a sum of independent random variables
More informationMath 55: Discrete Mathematics
Math 55: Discrete Mathematics UC Berkeley, Fall 2011 Homework # 7, due Wedneday, March 14 Happy Pi Day! (If any errors are spotted, please email them to morrison at math dot berkeley dot edu..5.10 A croissant
More information9.4. The Scalar Product. Introduction. Prerequisites. Learning Style. Learning Outcomes
The Scalar Product 9.4 Introduction There are two kinds of multiplication involving vectors. The first is known as the scalar product or dot product. This is so-called because when the scalar product of
More informationMathematical Induction
Mathematical Induction In logic, we often want to prove that every member of an infinite set has some feature. E.g., we would like to show: N 1 : is a number 1 : has the feature Φ ( x)(n 1 x! 1 x) How
More informationBayesian Tutorial (Sheet Updated 20 March)
Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that
More informationJust the Factors, Ma am
1 Introduction Just the Factors, Ma am The purpose of this note is to find and study a method for determining and counting all the positive integer divisors of a positive integer Let N be a given positive
More informationLinear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007)
MAT067 University of California, Davis Winter 2007 Linear Maps Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007) As we have discussed in the lecture on What is Linear Algebra? one of
More information3.2 Matrix Multiplication
3.2 Matrix Multiplication Question : How do you multiply two matrices? Question 2: How do you interpret the entries in a product of two matrices? When you add or subtract two matrices, you add or subtract
More informationLEARNING OBJECTIVES FOR THIS CHAPTER
CHAPTER 2 American mathematician Paul Halmos (1916 2006), who in 1942 published the first modern linear algebra book. The title of Halmos s book was the same as the title of this chapter. Finite-Dimensional
More informationUnit 4 The Bernoulli and Binomial Distributions
PubHlth 540 4. Bernoulli and Binomial Page 1 of 19 Unit 4 The Bernoulli and Binomial Distributions Topic 1. Review What is a Discrete Probability Distribution... 2. Statistical Expectation.. 3. The Population
More information6.042/18.062J Mathematics for Computer Science. Expected Value I
6.42/8.62J Mathematics for Computer Science Srini Devadas and Eric Lehman May 3, 25 Lecture otes Expected Value I The expectation or expected value of a random variable is a single number that tells you
More informationCh. 13.2: Mathematical Expectation
Ch. 13.2: Mathematical Expectation Random Variables Very often, we are interested in sample spaces in which the outcomes are distinct real numbers. For example, in the experiment of rolling two dice, we
More informationProbability definitions
Probability definitions 1. Probability of an event = chance that the event will occur. 2. Experiment = any action or process that generates observations. In some contexts, we speak of a data-generating
More informationBinomial lattice model for stock prices
Copyright c 2007 by Karl Sigman Binomial lattice model for stock prices Here we model the price of a stock in discrete time by a Markov chain of the recursive form S n+ S n Y n+, n 0, where the {Y i }
More information3.2 Roulette and Markov Chains
238 CHAPTER 3. DISCRETE DYNAMICAL SYSTEMS WITH MANY VARIABLES 3.2 Roulette and Markov Chains In this section we will be discussing an application of systems of recursion equations called Markov Chains.
More informationWald s Identity. by Jeffery Hein. Dartmouth College, Math 100
Wald s Identity by Jeffery Hein Dartmouth College, Math 100 1. Introduction Given random variables X 1, X 2, X 3,... with common finite mean and a stopping rule τ which may depend upon the given sequence,
More informationProbability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Know the definitions of sample space, event and probability function. 2. Be able to
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationConditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Know the definitions of conditional probability and independence
More informationSYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison
SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89 by Joseph Collison Copyright 2000 by Joseph Collison All rights reserved Reproduction or translation of any part of this work beyond that permitted by Sections
More informationLecture 17 : Equivalence and Order Relations DRAFT
CS/Math 240: Introduction to Discrete Mathematics 3/31/2011 Lecture 17 : Equivalence and Order Relations Instructor: Dieter van Melkebeek Scribe: Dalibor Zelený DRAFT Last lecture we introduced the notion
More informationChapter 4. Probability and Probability Distributions
Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the
More informationProbabilities. Probability of a event. From Random Variables to Events. From Random Variables to Events. Probability Theory I
Victor Adamchi Danny Sleator Great Theoretical Ideas In Computer Science Probability Theory I CS 5-25 Spring 200 Lecture Feb. 6, 200 Carnegie Mellon University We will consider chance experiments with
More informationChapter 31 out of 37 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M.
31 Geometric Series Motivation (I hope) Geometric series are a basic artifact of algebra that everyone should know. 1 I am teaching them here because they come up remarkably often with Markov chains. The
More informationSection 6-5 Sample Spaces and Probability
492 6 SEQUENCES, SERIES, AND PROBABILITY 52. How many committees of 4 people are possible from a group of 9 people if (A) There are no restrictions? (B) Both Juan and Mary must be on the committee? (C)
More informationDiscrete Math in Computer Science Homework 7 Solutions (Max Points: 80)
Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) CS 30, Winter 2016 by Prasad Jayanti 1. (10 points) Here is the famous Monty Hall Puzzle. Suppose you are on a game show, and you
More informationRegular Languages and Finite Automata
Regular Languages and Finite Automata 1 Introduction Hing Leung Department of Computer Science New Mexico State University Sep 16, 2010 In 1943, McCulloch and Pitts [4] published a pioneering work on a
More informationLinear Algebra I. Ronald van Luijk, 2012
Linear Algebra I Ronald van Luijk, 2012 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents 1. Vector spaces 3 1.1. Examples 3 1.2. Fields 4 1.3. The field of complex numbers. 6 1.4.
More information4.5 Linear Dependence and Linear Independence
4.5 Linear Dependence and Linear Independence 267 32. {v 1, v 2 }, where v 1, v 2 are collinear vectors in R 3. 33. Prove that if S and S are subsets of a vector space V such that S is a subset of S, then
More informationMATH10040 Chapter 2: Prime and relatively prime numbers
MATH10040 Chapter 2: Prime and relatively prime numbers Recall the basic definition: 1. Prime numbers Definition 1.1. Recall that a positive integer is said to be prime if it has precisely two positive
More informationMathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
More informationIn the situations that we will encounter, we may generally calculate the probability of an event
What does it mean for something to be random? An event is called random if the process which produces the outcome is sufficiently complicated that we are unable to predict the precise result and are instead
More informationNotes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
More informationINTERSECTION MATH And more! James Tanton
INTERSECTION MATH And more! James Tanton www.jamestanton.com The following represents a sample activity based on the December 2006 newsletter of the St. Mark s Institute of Mathematics (www.stmarksschool.org/math).
More information2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system
1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables
More informationI. GROUPS: BASIC DEFINITIONS AND EXAMPLES
I GROUPS: BASIC DEFINITIONS AND EXAMPLES Definition 1: An operation on a set G is a function : G G G Definition 2: A group is a set G which is equipped with an operation and a special element e G, called
More information6 Scalar, Stochastic, Discrete Dynamic Systems
47 6 Scalar, Stochastic, Discrete Dynamic Systems Consider modeling a population of sand-hill cranes in year n by the first-order, deterministic recurrence equation y(n + 1) = Ry(n) where R = 1 + r = 1
More informationSudoku puzzles and how to solve them
Sudoku puzzles and how to solve them Andries E. Brouwer 2006-05-31 1 Sudoku Figure 1: Two puzzles the second one is difficult A Sudoku puzzle (of classical type ) consists of a 9-by-9 matrix partitioned
More informationDERIVATIVES AS MATRICES; CHAIN RULE
DERIVATIVES AS MATRICES; CHAIN RULE 1. Derivatives of Real-valued Functions Let s first consider functions f : R 2 R. Recall that if the partial derivatives of f exist at the point (x 0, y 0 ), then we
More informationCh. 13.3: More about Probability
Ch. 13.3: More about Probability Complementary Probabilities Given any event, E, of some sample space, U, of a random experiment, we can always talk about the complement, E, of that event: this is the
More informationProbability and Statistics Vocabulary List (Definitions for Middle School Teachers)
Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence
More informationSTA 371G: Statistics and Modeling
STA 371G: Statistics and Modeling Decision Making Under Uncertainty: Probability, Betting Odds and Bayes Theorem Mingyuan Zhou McCombs School of Business The University of Texas at Austin http://mingyuanzhou.github.io/sta371g
More informationChapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors
Chapter 9. General Matrices An n m matrix is an array a a a m a a a m... = [a ij]. a n a n a nm The matrix A has n row vectors and m column vectors row i (A) = [a i, a i,..., a im ] R m a j a j a nj col
More information