STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE TROY BUTLER 1. Random variables and distributions We are often presented with descriptions of problems involving some level of uncertainty about what the outcome will be of a physical experiment or recorded data. We find it useful to quantify the outcomes with real numbers. The function (or map or rule) that defines which real number gets associated with which particular outcome is what we call a random variable (rv) often denoted by a capital letter such as X or Y (the generic choices). Random variables are not random! The only thing that is uncertain about them is what the input will be as that comes from a yet-to-be-performed physical experiment or datum recorded from a not-yet-chosen member of a population! A random variable is NOT RANDOM! IT IS NOT RANDOM! It is a well-defined function! For example, we might say that we are interested in the heights of students in this class. I would represent the recorded height as the output of the random variable X. The only reason I am unsure of the outputs of X is that I do not know who will be chosen, but once a student is chosen, there is nothing random about this student s height. Once we have settled upon what the random variable is (i.e. how we map outcomes from a sample space, which is nothing more than a domain containing all the possible outcomes, to the real numbers), we are interested in the distribution of this random variable. Specifically, we want to know how to compute probabilities of events defined by some sets of real numbers. An event defined in terms of the random variable belonging to some set of real numbers means nothing more than the event of all outcomes in the sample space that get mapped into this set. For example, we might want to know the probability of the height of students in this class being less than 6 ft. Again, letting X denote the height of the students in this class (recorded in units of ft), we are asking about P (X < 6), which is read as the probability of the event defined by the random variable being less than 6. We are really asking a question about the proportion of students within this class such that when their heights are measured have values less than 6 ft. The list of all students in this class is the list of all the outcomes defining the sample space, and we map a given student to the student s associated height. As a very specific example, suppose Peyton Manning is a student in the class and he is exactly 6.47 ft tall and no one else is this height. If we ask the question, what is the probability of the event that X = 6.47? Then we are really asking the question, what is the probability that Peyton Manning will randomly be selected from the class? If we ask the question, what is the probability of the event that X > 6.47? Then we are really asking, what is the probability that a student taller than Peyton 1
2 TROY BUTLER Manning will be randomly selected from the class? Thus, questions about the probability of rv X having certain real-numbered values are really questions about the probability of certain outcomes in the sample space. The last sentence in the above paragraph implies that if we want to determine the probability distribution of random variable X, then we must consider the underlying probability of the sample space it acts upon! How do we determine the probabilities of these various outcomes in this sample space? In what follows, we use S (read script S ) to denote the sample space and s S to denote a particular outcome (or sample) s in this sample space. Uppercase letters denote random variables and their lowercase counterparts represent particular real numbers, for example X(s) = x indicates that outcome s is mapped to real number x by rv X. 2. Discrete random variables and their distributions 2.1. Bernoulli random variables. Consider an experiment with the following two outcomes: success (S) and failure (F ). Thus, S = {S, F }. Define the rv X : S R as, X(S) = 1, and X(F ) = 0. We define a Bernoulli random variable as any rv whose only possible values are 0 and 1. A Bernoulli trial is an experiment that will result in one of two outcomes, a success or a failure. The canonical example for a Bernoulli trial is a coin toss where the coin landing heads up is a success with success probability denoted by 0 ρ 1 and landing tails up is a failure with failure probability given by 1 ρ. The pmf for Bernoulli rv X : {S, F } {0, 1} is given as above with p(1) = ρ and p(0) = 1 ρ. We often denote X Bernoulli(ρ) to indicate that rv X has a Bernoulli distribution with success probability ρ. Bernoulli rv s and the concept of independent identically distributed (or i.i.d. or iid) Bernoulli trials is critical in many areas of probability theory including the development of the Binomial distribution. Any rv (continuous or discrete) X can be used to define a Bernoulli rv simply by identifying an event of interest. For example, we can let X denote the price paid by all first-time home buyers in the greater Denver area. Clearly X is not a Bernoulli rv as there are lots of prices that could be paid. However, if we decide that we are interested only in determining the probability that first-time home buyers paid less than $250,000, now we have defined a brand-spanking-new Bernoulli rv that we call Y (since X is already taken). Here, Y is really a function of X and since X is a function on the sample space defined by first-time home buyers, so is Y. If X <$250,000, then Y = 1, otherwise Y = 0. The probability of success is defined by P (X < $250, 000). All that is necessary to define a Bernoulli rv is to somehow define a rule that separates the sample space into two disjoint sets where one of those sets gets mapped to 1 and the other gets mapped to 0.
STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE 3 2.2. Binomial random variables. Let X be the sum of n i.i.d. (independent identically distributed) Bernoulli trials with success probability ρ, then X Binomial(n, ρ) with pmf: b(x; n, ρ) := n x ρ x (1 ρ) n x x {0, 1, 2,..., n} 0 otherwise What does S look like? Suppose there are 3 Bernoulli trials defining the sample space, then S := {SSS, SSF, SF S, F SS, SF F, F SF, F F S, F F F } defines all of the possible 8 distinct outcomes from the experiment. The rv X maps s S to the number of S s showing up in the element s (keep the s s straight here). For example, if s = SSS then X(s) = 3, if s = SF S then X(s) = 2 but s = SSF also has X(s) = 2 because the rv X does not care which order the S s appear but only the number of them (that is the rule that defines X). We use B(x; n, p) to denote the cdf of a binomial rv X. This does not give the probability of X = x (that is given by P (x) which is a shorthand way of denoting the pmf evaluated at x), it gives the probability of the event X x. Given a dichotomous population (meaning a population defined by two disjoint sets satisfying some rule ) of size N, if we use a sample of n from this population without replacement, then the rv X counting the number of successes in the n samples is not a binomial distribution. Why? Each trial within the experiment is not independent. However, if n/n < 0.05, then we can reasonably approximate the distribution of X as a binomial distribution. In the example of first-time home buyers, if we say that we randomly sample 8 names from a list of first-time home buyers (and assume this list has N names so that 8/N < 0.05), and we want to know the probability that at least 3 of them paid less than $250,000, then we are asking a question about a rv that has a binomial distribution with n = 8 and probability of success given by P (X < $250, 000) where X is the price paid as described previously. This new rv can be called Y (but if you decide to list the intermediate step of defining a Bernoulli rv and use Y to denote this associated Bernoulli rv as was done previously, then you should call the binomial rv something else like W to avoid confusion). 2.3. Poisson random variables. The Poisson distribution is used to describe the probabilities of x numbers of events occurring in a fixed interval of time or space where λ represents the mean frequency per unit time/space. For example, the number of cars passing through an intersection in a fixed unit of time, the number of phone calls being routed through a cell tower in a given hour, or the number of chocolate chips per cookie baked from a big batch are often appropriately modeled by Poisson random variables.
4 TROY BUTLER A random variable X follows the Poisson distribution with parameter λ (λ > 0) if the pmf of X is given by e λ λ x x! x {0, 1, 2, 3,...} p(x; λ) = 0 otherwise. Remark 1. Given a binomial pmf b(x; n, p), if we let n and p 0 s.t. np λ > 0, then b(x; n, p) p(x; λ). The above remark implies that even though the binomial distribution might be the correct distribution to model the specific problem you are considering, it might be more computationally practical to use a Poisson distribution to approximate the answers. However, this approximation only holds in certain cases and we use the rule of thumb that this approximation holds when n > 50 and nρ < 5. In this case, we approximate the binomial distribution with the Poisson distribution where λ = nρ. Theorem 1. If the number of events that can occur in a time interval are independent with a mean rate λ and there are t disjoint time intervals, then X = the number of events occurring in the t time intervals follows a Poisson distribution with mean λt. Returning again to the example of first time home-buyers, we might want to model the number of firsttime home buyers in any year. We would have to know or be given data over the years in which to estimate the mean number of first-time home buyers to use as the parameter in the Poisson distribution. Suppose we have such a model distribution and the mean number of first-time home buyers in any 12 month span is 24,000, and we now want to model the number of first-time home buyers in any 6 month span, then it is reasonable to take a Poisson distribution with parameter 12,000 (by the above theorem). 2.4. Non-named distributions. When given a description of a finite (or countable) sample space and a rv X that does not conform to the type of descriptions that the named distributions above model, we must use the description along with rules of probabilitiy/logic/etc. to determine the distribution of X (meaning we must determine what the pmf is). 3. Continuous random variables and their distributions The common continuous distributions used in this class are the uniform, exponential, and normal/student T distributions. It will almost always be immediately clear from context which one applies as terms like uniform or equally likely show up when describing the uniform distribution and you will almost always be told whether or not the exponential or normal distribution is used to model the distribution of a particular rv. The exception is when we consider statistics (quick: what is a statistic?). Specifically, we often look at sample means or sample proportions as statistics and with a large enough sample size, the distributions of these statistics are approximately normal (Student T is approximately normal) by the Central Limit
STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE 5 Theorem (CLT). You will know which distribution to use in these cases based on the sample size and the use of either the exact or approximate standard deviation as we discuss in chapter 7.