# NON-UNIFORM RANDOM VARIATE GENERATION

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 NON-UNIFORM RANDOM VARIATE GENERATION Luc Devroye School of Computer Science McGill University Abstract. This chapter provides a survey of the main methods in non-uniform random variate generation, and highlights recent research on the subject. Classical paradigms such as inversion, rejection, guide tables, and transformations are reviewed. We provide information on the expected time complexity of various algorithms, before addressing modern topics such as indirectly specified distributions, random processes, and Markov chain methods. Authors address: School of Computer Science, McGill University, 3480 University Street, Montreal, Canada H3A 2K6. The authors research was sponsored by NSERC Grant A3456 and FCAR Grant 90-ER-0291.

2 1. The main paradigms The purpose of this chapter is to review the main methods for generating random variables, vectors and processes. Classical workhorses such as the inversion method, the rejection method and table methods are reviewed in section 1. In section 2, we discuss the expected time complexity of various algorithms, and give a few examples of the design of generators that are uniformly fast over entire families of distributions. In section 3, we develop a few universal generators, such as generators for all log concave distributions on the real line. Section 4 deals with random variate generation when distributions are indirectly specified, e.g, via Fourier coefficients, characteristic functions, the moments, the moment generating function, distributional identities, infinite series or Kolmogorov measures. Random processes are briefly touched upon in section 5. Finally, the latest developments in Markov chain methods are discussed in section 6. Some of this work grew from Devroye (1986a), and we are carefully documenting work that was done since More recent references can be found in the book by Hörmann, Leydold and Derflinger (2004). Non-uniform random variate generation is concerned with the generation of random variables with certain distributions. Such random variables are often discrete, taking values in a countable set, or absolutely continuous, and thus described by a density. The methods used for generating them depend upon the computational model one is working with, and upon the demands on the part of the output. For example, in a ram (random access memory) model, one accepts that real numbers can be stored and operated upon (compared, added, multiplied, and so forth) in one time unit. Furthermore, this model assumes that a source capable of producing an i.i.d. (independent identically distributed) sequence of uniform [0, 1] random variables is available. This model is of course unrealistic, but designing random variate generators based on it has several advantages: first of all, it allows one to disconnect the theory of non-uniform random variate generation from that of uniform random variate generation, and secondly, it permits one to plan for the future, as more powerful computers will be developed that permit ever better approximations of the model. Algorithms designed under finite approximation limitations will have to be redesigned when the next generation of computers arrives. For the generation of discrete or integer-valued random variables, which includes the vast area of the generation of random combinatorial structures, one can adhere to a clean model, the pure bit model, in which each bit operation takes one time unit, and storage can be reported in terms of bits. Typically, one now assumes that an i.i.d. sequence of independent perfect bits is available. In this model, an elegant information-theoretic theory can be derived. For example, Knuth and Yao (1976) showed that to generate a random integer X described by the probability distribution {X = n} = p n, n 1, any method must use an expected number of bits greater than the binary entropy of the distribution, p n log 2 (1/p n ). n They also showed how to construct tree-based generators that can be implemented as finite or infinite automata to come within three bits of this lower bound for any distribution. While this theory is elegant and theoretically important, it is somewhat impractical to have to worry 2

3 about the individual bits in the binary expansions of the p n s, so that we will, even for discrete distributions, consider only the ram model. Noteworthy is that attempts have been made (see, e.g., Flajolet and Saheb, 1986) to extend the pure bit model to obtain approximate algorithms for random variables with densities The inversion method. For a univariate random variable, the inversion method is theoretically applicable: given the distribution function F, and its inverse F inv, we generate a random variate X with that distribution as F inv (U), where U is a uniform [0, 1] random variable. This is the method of choice when the inverse is readily computable. For example, a standard exponential random variable (which has density e x, x > 0), can be generated as log(1/u). The following table gives some further examples. Name Density Distribution function Random variate Exponential e x, x > 0 1 e x log(1/u) Weibull (a), a > 0 ax a 1 e xa, x > 0 1 e xa (log(1/u)) 1/a Gumbel e x e e x e e x log log(1/u) Logistic 1 2+e x +e x 1 1+e x log((1 U)/U) Cauchy 1 π(1+x 2 ) 1/2 + (1/π) arctan x tan(πu) Pareto (a), a > 0 a x a+1, x > 1 1 1/x a 1/U 1/a Table 1: Some densities with distribution functions that are explicitly invertible. The fact that there is a monotone relationship between U and X has interesting benefits in the area of coupling and variance reduction. In simulation, one sometimes requires two random variates from the same distribution that are maximally anti-correlated. This can be achieved by generating the pair (F inv (U), F inv (1 U)). In coupling, one has two distribution functions F and G and a pair of (possibly dependent) random variates (X, Y ) with these marginal distribution functions is needed so that some appropriate metric measuring distance between them is minimized. For example, the Wasserstein metric d 2 (F, G) is the minimal value over all couplings of (X, Y ) of {(X Y ) 2 }. That minimal coupling occurs when (X, Y ) = (F inv (U), G inv (U)) (see, e.g., Rachev, 1991). Finally, if we wish to simulate the maximum M of n i.i.d. random variables, each distributed as X = F inv (U), then noting that the maximum of n i.i.d. uniform [0, 1] random variables is distributed as U 1/n, we see that M can be simulated as F inv (U 1/n ). 3

4 1.2. Simple transformations. We call a simple transformation one that uses functions and operators that are routinely available in standard libraries, such as the trigonometric, exponential and logarithmic functions. For example, the inverse of the normal and stable distribution functions cannot be computed using simple transformations of one uniform random variate. For future reference, the standard normal density is given by exp( x 2 /2)/ 2π. However, the next step is to look at simple transformations of a k uniform [0, 1] random variates, where k is either a small fixed integer, or a random integer with a small mean. It is remarkable that one can obtain the normal and indeed all stable distributions using simple transformations with k = 2. In the Box-Müller method (Box and Müller, 1958), a pair of independent standard normal random variates is obtained by setting ( (X, Y ) = log(1/u1 ) cos(2πu 2 ), ) log(1/u 1 ) sin(2πu 2 ), where U 1, U 2 are independent uniform [0, 1] random variates. For the computational perfectionists, we note that the random cosine can be avoided: just generate a random point in the unit circle by rejection from the enclosing square (more about that later), and then normalize it so that it is of unit length. Its first component is distributed as a random cosine. There are many other examples that involve the use of a random cosine, and for this reason, they are called polar methods. We recall that the beta (a, b) density is x a 1 (1 x) b 1, 0 x 1, B(a, b) where B(a, b) = Γ(a)Γ(b)/Γ(a + b). A symmetric beta (a, a) random variate may be generated as ( ) U 2 2a 1 1 cos(2πu 2 ) 2 (Ulrich, 1984), where a 1/2. Devroye (1996) provided a recipe valid for all a > 0: S ( U 1 a 1 1 ) cos 2 (2πU 2) where S is a random sign. Perhaps the most striking result of this kind is due to Bailey (1994), who showed that ( ) a U 2 a 1 1 cos(2πu 2 ) has the Student t density (invented by William S. Gosset in 1908) with parameter a > 0: 1 ab(a/2, 1/2) ( 1 + x 2 a ) a+1 2,, x. Until Bailey s paper, only rather inconvenient rejection methods were available for the t density. There are many random variables that can be represented as ψ(u)e α, where ψ is a function, U is uniform [0, 1], α is a real number, and E is an independent exponential random variable. These lead to simple algorithms for a host of useful yet tricky distributions. A random 4

5 variable S α,β with characteristic function ( ϕ(t) = exp t α iπβ(α 2 ) α>1) sign(t) 2 is said to be stable with parameters α (0, 2] and β 1. Its parameter α determines the size of its tail. Using integral representations of distribution functions, Kanter (1975) showed that for α < 1, S α,1 is distributed as where ψ(u) = ψ(u)e 1 1 α, ( ) 1 sin(απu) α sin(πu) ( sin((1 α)πu) sin(απu) For general α, β, Chambers, Mallows and Stuck (1976) showed that it suffices to generate it as where ( cos(π((α 1)u + αθ)/2) ψ(u) = cos(πu/2) ψ(u 1/2)E 1 1 α, ) 1 ( α ) 1 α α sin(πα(u + θ)/2) cos(π((α 1)u + αθ)/2) Zolotarev (1959, 1966, 1981, 1986) has additional representations and a thorough discussion on these families of distributions. The paper by Devroye (1990) contains other examples with k = 3, including S α,0 E 1 α, which has the so-called Linnik distribution (Linnik, 1962) with characteristic function ϕ(t) = t α, 0 < α 2. See also Kotz and Ostrovskii (1996). It also shows that S α,1 E 1 α, has the Mittag-Leffler distribution with characteristic function ϕ(t) = ( it) α, 0 < α 1. Despite these successes, some distributions seem hard to treat with any k-simple transformation. For example, to date, we do not know how to generate a gamma random variate (i.e., a variate with density x a 1 e x /Γ(a) on (0, )) with arbitrary parameter a > 0 using only a fixed finite number k of uniform random variates and simple transformations.. ). 5

6 1.3. Simple methods for multivariate distributions. For random vectors (X 1,..., X d ), the inversion method proceeds by peeling off the dimensions. That is, we first generate X d from its marginal distribution, then X d 1 given X d, using a conditional distribution, and so forth. There is of course no reason to employ conditioning in that rigid manner. For example, if a distribution has a density that is a function of i x2 i only, then it is advantageous to find the marginal of i X2 i, generate a random variate from this marginal distribution, and then generate a random variate uniformly distributed on the surface of the ball with the given random radius. And to generate a uniformly distributed point on the surface of the unit ball, just generate d i.i.d. standard normal random variates X 1,..., X d, and normalize the length to one. Sometimes, one wishes to generate random vectors with a certain dependence structure. This can be captured by various measures of correlation. For a survey of random variate generators in this vein, see Devroye (1986a). An interesting new area of research is related to the development of generators with a given copula structure. The copula of two uniform [0, 1] random variables X and Y is given by the joint distribution function C(x, y) = {X x, Y y}. We say that two arbitrary continuous random variables X and Y (with distribution functions F and G, respectively) have the copula structure given by C if F (X) and G(Y ) (which are both uniform [0, 1]) have joint distribution function C. Thus, we need methods for generating random vectors with distribution function C. Various interesting families of copulas and some seedlings of algorithms, are given in Nelsen s book (1999) and in Chapter 5 of the present volume Inversion for integer-valued random variables. For integer-valued random variables with {X = n} = p n, n 0, the inversion method is always applicable: X 0 Generate U uniform [0, 1] S p 0 (S holds the partial sums of the p n s) while U > S do X X + 1, S S + p X return X The expected number of steps here is {X + 1}. Improvements are possible by using data structures that permit one to invert more quickly. When there are only a finite number of values, a binary search tree may help. Here the leaves correspond to various outcomes for X, and the internal nodes are there to guide the search by comparing U with appropriately picked thresholds. If the cost of setting up this tree is warranted, then one could always permute the leaves to make this into a Huffman tree for the weights p n (Huffman, 1952), which insures that the expected time to find a leaf is not more than one plus the binary entropy, p n log 2 (1/p n ). n In any case, this value does not exceed log 2 N, where N is the number of possible values X can take. Another way of organizing this is to store an array with the partial sums p 1, p 1 + p 2,..., p p N = 1, and to search for the interval in which U falls by binary search, 6

7 which would take O(log 2 N) time. As mentioned above, the Huffman tree is optimal, and thus better, but requires more work to set up Guide tables. Hash tables, or guide tables (Chen and Asau, 1974), have been used to accelerate the inversion even further. Given the size of the universe, N, we create a hash table with N entries, and store in the i-th entry the value of X if U were i/n, 0 i < N. Then U gets hashed to Z = NU. Then return to the table of partial sums and start the search at p p Z. It is easy to show that the expected time, table set-up excluded, is bounded by a constant uniformly over all distributions on N values. The table can be constructed in O(N) time. Walker (1974, 1977) showed that one can construct in time O(N) a table (q i, r i ), 1 i N, such that the following method works: pick a uniform integer Z from 1, 2,..., N. Generate a uniform [0, 1] random variate U. If U q Z, then return Z, else return r Z. The values r i are called the aliases, and the method is now known as the alias method. If a distribution is fixed once and for all, and N is such that the storage of the table is of no concern, then the alias method is difficult to beat Mixture methods. Densities that can be written as mixtures p n f n (x), n with nonnegative weights p n summing to one, are trivially dealt with by first generating a random index Z with discrete distribution {p n }, and then generating a random variate with density f Z. They do, therefore, not require any special treatment. Sometimes, we have more intricate mixtures, when, e.g., f(x) = {g Z (x)}, where Z is a random variable, and g Z is a density in which Z is a parameter. Clearly, it suffices to generate Z first and then to generate a random variate from g Z. The textbook example here is N/ G a, where N is standard normal, and G a is a gamma random variable with parameter a, independent of N. The ratio has the Student t distribution with parameter a. Other examples relevant for further on are U/V, a ratio of two i.i.d. uniform [0, 1] random variates, which has density (1/2) min(1, 1/x 2 ), and UV, which has density log(1/x) on (0, 1]. Gamma random variates of parameter less than one are often cumbersome, but we know that G a L = U 1/a G a+1 where the notation is as above and all variates are independent on the right. Finally, a Cauchy random variate can be obtained as N 1 /N 2, the ratio of two independent standard normal random variates. Mixtures are also useful in discrete settings. For example, the negative binomial distribution (the number of failures before the n-th success) with parameters (n, p), n 1, p (0, 1), is given by the probabilities ( ) n + k 1 p k = (1 p) k p n, k 0. k One can check that this can be generated as a Poisson (Y ) random variate where Y in turn is (1 p)/p times a gamma (n) random variate. A special case is the geometric (p) distribution, 7

8 which is negative binomial (1, p). Here, though, it is better to use the inversion method which can be made explicit by the truncation operator: it suffices to take where U is uniform [0, 1]. log 1 p U, 1.7. The rejection method. Von Neumann (1951) proposed the rejection method, which uses the notion of a dominating measure. Let X have density f on d. Let g be another density with the property that for some finite constant c 1, called the rejection constant, f(x) cg(x), x d. For any nonnegative integrable function h on d, define the body of h as B h = {(x, y) : x d, 0 y h(x)}. Note that if (X, Y ) is uniformly distributed on B h, then X has density proportional to h. Vice versa, if X has density proportional to h, then (X, Uh(X)), where U is uniform [0, 1] and independent of X, is uniformly distributed on B h. These facts can be used to show the validity of the rejection method: repeat Generate U uniformly on [0, 1] Generate X with density g until Ucg(X) f(x) return X The expected number of iterations before halting is c, so the rejection constant must be kept small. This method requires some analytic work, notably to determine c, but one attractive feature is that we only need the ratio f(x)/(cg(x)), and thus, cumbersome normalization constants often cancel out Rejection: a simple example. The rejection principle also applies in the discrete setting, so a few examples follow to illustrate its use in all settings. We begin with the standard normal density. The start is an inequality such as e x2 /2 e α2 /2 α x. The area under the dominating curve is e α2 /2 2/α, which is minimized for α = 1. Generating a random variate with the Laplace density e x can be done either as SE, where S is a random sign, and E is exponential, or as E 1 E 2, a difference of two independent exponential random variables. The rejection algorithm thus reads: 8

9 repeat Generate U uniformly on [0, 1] Generate X with with the Laplace density until Ue 1/2 X e X2 /2 return X However, taking logarithms in the last condition, and noting that log(1/u) is exponential, we can tighten the code using a random sign S, and two independent exponentials, E 1, E 2 : Generate a random sign S repeat Generate E 1, E 2 until 2E 2 > (E 1 1) 2 return X SE 1 It is easy to verify that the rejection constant (the expected number of iterations) is 2e/π Rejection: a more advanced example. Assume that f is a monotonically decreasing density on [0, 1], and define M = f(0). We could apply rejection with M as a bound: repeat Generate U uniformly on [0, 1] Generate X uniformly on [0, 1] until UM f(x) return X The expected number of iterations is M. However, the algorithm does not make any use of the monotonicity. As we know that f(x) min(m, 1/x), and the integral of the upper bound (or the rejection constant) is 1 + log M, we can exponentially speed up the performance by basing our rejection on the latter bound. It is a simple exercise to show that the indefinite integral of the upper bound is { Mx (0 x 1/M) 1 + log(mx) (1/M x 1). A random variate can easily be generated by inversion of a uniform [0, 1] random variate U: if U 1/(1 + log M), return (1 + log M)U/M, else return (1/M) exp((1 + log M)U 1). Thus, we have the improved algorithm: 9

10 repeat Generate U, V independently and uniformly on [0, 1]. Set W = V (1 + log M). If W 1 then set X W/M else set X exp(w U 1)/M until U min(m, 1/X) f(x) return X It is possible to beat even this either by spending more time setting up (a one-time cost) or by using additional information on the density. In the former case, partition [0, 1] into k equal intervals, and invest once in the calculation of f(i/k), 0 i < k. Then the histogram function of height f(i/k) on [i/k, (i + 1)/k) dominates f and can be used for rejection purposes. This requires the bin probabilities p i = f(i/k)/ k 1 j=0 f(j/k). Rejection can proceed as follows: repeat Generate U, V independently and uniformly on [0, 1] Generate Z on {0, 1,..., k 1} according to the probabilities {p i }. Set X = (Z + V )/k. until Uf(Z/k) f(x) return X Here the area under the dominating curve is at most one plus the area sandwiched between f and it. By shifting that area over to the left, bin by bin, one can bound it further by 1+M/k. Thus, we can adjust k to reduce the rejection constant at will. For example, a good choice would be k = M. The expected time per variate would be O(1) if we can generate the random integer Z in time O(1). This can be done, but only at the expense of the set-up of a table of size k (see above). In one shot situations, this is a prohibitive expense. In Devroye (1986), methods based on partitioning the density in slabs are called strip methods. Early papers in this direction include Ahrens and Dieter (1989), Ahrens (1993, 1995), Marsaglia and Tsang (1984) and Marsaglia, Maclaren and Bray (1964). Assume next that we also know the mean µ of f. Then, even dropping the requirement that f be supported on [0, 1], we have µ = 0 yf(y)dy f(x) x 0 ydy = f(x)x 2 /2. Hence, ( f(x) min M, 2µ ). x 2 The integral of the upper bound from 0 to x is { Mx 0 x < 2µ/M 2 2µM 2µ x 2µ/M. x The area under the upper bound is therefore 2 2µM. To generate a random variate with density proportional to the upper bound, we can proceed in various ways, but it is easy to check that 2µ/MU 1 /U 2 fits the bill, where U 1, U 2 are independent uniform [0, 1] random variables. Thus, we summarize: 10

11 repeat Generate U, V, W independently and uniformly on [0, 1]. Set X = 2µ/MV/W. until UM min (1, (W/V ) 2 ) f(x) return X This example illustrates the development of universal generators, valid for large classes of densities, as a function of just a few parameters. The reader will be able to generalize the above example to unimodal densities with known r-th moment about the mode. The example given above applies with trivial modifications to all unimodal densities, and the expected time is bounded by a constant times f(m) { X m }, where m is the mode. For example, the gamma density with parameter a 1 has mean a, and mode at a 1, with f(a 1) C/ a for some constant C, and { X (a 1) } C a for another constant C. Thus, the expected time taken by our universal algorithm takes expected time bounded by a constant, uniformly over a 1. We say that the algorithm is uniformly fast over this class of densities The alternating series method. To apply the rejection method, we do not really need to know the ratio f(x)/(cg(x)) exactly. It suffices that we have an approximation φ n (x) that tends, as n, to f(x)/(cg(x)), and for which we know a monotone error bound ɛ n (x) 0. In that case, we let n increase until for the first time, either (in which case we accept X), or U φ n (X) ɛ n (X) U φ n (X) + ɛ n (X) (in which case we reject X). Equivalently, we have computable bounds ξ n (x) and ψ n (x) with the property that ξ n (x) f(x)/(cg(x)) and ψ n (x) f(x)/(cg(x)) as n. This approach is useful when the precise computation of f is impossible, e.g., when f is known as infinite series or when f can never be computed exactly using only finitely many resources. It was first developed for the Kolmogorov-Smirnov limit distribution in Devroye (1981a). For another use of this idea, see Keane and O Brien s Bernoulli factory (1994). 11

12 repeat Generate U uniformly on [0, 1] Generate X with density g Set n = 0 repeat n n + 1 until U ξ n (X) or U ψ n (X) until U ξ n (X) return X The expected number of iterations in the outer loop is still c, as in the rejection method. However, to take the inner loop into account, let N be the largest index n attained in the inner loop. Note that N > t implies that U [ξ t (X), ψ t (X)]. Thus, {N X} = {N > t X} (ψ t (X) ξ t (X)). Unconditioning, t=0 {N} t=0 t=0 {ψ t (X) ξ t (X)}. To be finite, this requires a rate of decrease in the approximation error that is at least 1/t. One must also take into account that the computation of ξ t and ψ t may also grow with t. The bound above is indicative of the expected complexity only when, say, ξ t can be computed from ξ t 1 in one time unit. We cannot stress strongly enough how important the alternating series method is, as it frees us from having to compute f exactly. It is indeed the key to the solution of a host of difficult non-uniform random variate generation problems. 2. Uniformly bounded times If F is a class of distributions, it is useful to have generators for which the expected time is uniformly bounded over F. In the case of the rejection method, this often (but not always, see, e.g., the alternating series method) means bounding the rejection constant uniformly. For example, pre-1970 papers routinely recommended generating gamma (k) random variables as a sum of k independent exponentials, and generating a binomial (n, p) random variate as a sum of n indepoendent Bernoulli (p) random variables. Even today, many may still simulate a sum of n i.i.d. random variables by generating all n of them and summing. So, whether F is the class of all gamma distributions, all binomial distributions, or all sums of distributions with a fixed density f, the idea of a uniformly bounded time is important, and has, in fact, given new life to the area of non-uniform random variate generation. In this section, we briefly survey some key classes F, and provide some references to good generators. Many methods in the literature are rejection methods in disguise. For example, for realvalued random variates, the ratio of uniforms method is based upon the use of bounding curves of the form min (A, Bx ). 2 12

13 The area under the bounding curve is 4 AB, and a random variate with density proportional to it can be generated as B/A SU/V, where S is a random sign, and U, V are independent uniform [0, 1] random variates. The method is best applied after shifting the mode or mean of the density f to the origin. One could for example use A = sup x f(x), B = sup x 2 f(x). x In many cases, such as for mode-centered normals, gamma, beta, and Student t distributions, the rejection constant 4 AB is uniformly bounded over all or a large part of the parameter space. Discretized versions of it can be used for the Poisson, binomial and hypergeometric distributions. Uniformly fast generators have been obtained for all major parametric families. Interesting contributions include the gamma generators of Cheng (1977), Le Minh (1988), Marsaglia (1977), Ahrens and Dieter (1982), Cheng and Feast (1979, 1980), Schmeiser and Lal (1980), Ahrens, Kohrt and Dieter (1983), and Best (1983), the beta generators of Ahrens and Dieter (1974), Zechner and Stadlober (1993), Best (1978), and Schmeiser and Babu (1980), the binomial methods of Stadlober (1988, 1989), Ahrens and Dieter (1980), Hörmann (1993), and Kachitvichyanukul and Schmeiser (1988, 1989), the hypergeometric generators of Stadlober (1988, 1990), and the code for Poisson variates by Ahrens and Dieter (1980), and Hörmann (1993). Some of these algorithms are described in Devroye (1986a), where some additional uniformly fast methods can be found. All the distributions mentioned above are log-concave, and it is thus no surprise that we can find uniformly fast generators. The emphasis in most papers is on the details, to squeeze the last millisecond out of each piece of code. To conclude this section, we will just describe a recent gamma generator, due to Marsaglia and Tsang (2001). It uses almost-exact inversion. Its derivation is unlike anything in the present survey, and shows the careful planning that goes into the design of a nearly perfect generator. The idea is to consider a monotone transformation h(x), and to note that if X has density e g(x) def = h(x)a 1 e h(x) h (x), Γ(a) then Y = h(x) has the gamma (a) density. If g(x) is close to c x 2 /2 for some constant c, then a rejection method from a normal density can be developed. The choice suggested by Marsaglia and Tsang is h(x) = d(1 + cx) 3, 1/c < x <, with d = a 1/3, c = 1/ 9d. In that case, simple computations show that g(x) = 3d log(1 + cx) d(1 + cx) 3 + d + C, 1/c < x <, for some constant C. Define w(x) = x 2 /2 g(x). Note that w (x) = x 3dc + 3dc(1 + 1+cx cx)2, w (x) = 1 + 3dc2 + 6dc 2 (1 + cx), w (x) = 6dc3 + 6dc 3. The third derivative is zero at (1+cx) 2 (1+cx) 3 (1 + cx) 3 = 1, or x = 0. Thus, w reaches a minimum there, which is 1 + 9dc 2 = 0. Therefore, by Taylor s series expansion, and the fact that w(0) = C, w (0) = 0, we have w(x) C. Hence, e g(x) e C x2 /2 and the rejection method immediately yields the following: 13

14 d a 1/3 c 1/ 9d repeat generate E exponential generate X normal Y d(1 + cx) 3 until X 2 /2 E d log(y ) Y + d return Y We picked d so that e g C is nearly maximal. By the transformation y = d(1 + cx) 3 and by Stirling s approximation, as d, 1/c e g(x) C dx = 0 y d 2/3 e d y 3cd d+1/3 (1 + 1/3d)d+1/3 2π 3ce 1/3 d + 1/3 dy = Γ(d + 1/3)ed 3cd d+1/3 18πd 3 d + 1/3 2π. Using e x2 /2 dx = 2π, we note that the asymptotic (as a ) rejection constant is 1: we have a perfect fit! Marsaglia and Tsang recommend the method only for a 1. For additional speed, a squeeze step (or quick acceptance step) may be added. 3. Universal generators It is quite important to develop generators that work well for entire families of distributions. An example of such development is the generator for log concave densities given in Devroye (1984a), that is, densities for which log f is concave. On the real line, this has several consequences: these distributions are unimodal and have sub-exponential tails. Among the log concave densities, we find the normal densities, the gamma family with parameter a 1, the Weibull family with parameter a 1, the beta family with both parameters 1, the exponential power distribution with parameter 1 (the density being proportional to exp( x a )), the Gumbel or double exponential distribution (k k exp( kx ke x )/(k 1)! is the density, k 1 is the parameter), the generalized inverse Gaussian distribution, the logistic distribution, Perks distribution (with density proportional to 1/(e x + e x + a), a > 2), the hyperbolic secant distribution (with distribution function (2/π) arctan(e x )), and Kummer s distribution (for a definition, see Devroye, 1984a). Also, we should consider simple nonlinear transformations of other random variables. Examples include arctan X with X Pearson IV, log X (X being Student t for all parameters), log X (for X gamma, all parameters), log X (for X log-normal), log X (for X Pareto), log(x/(1 X)) (for all beta variates X), and log X (for X beta (a, b) for all a > 0, b 1). In its most primitive version, assuming one knows that a mode occurs at m, we consider the generation problem for the normalized random variable Y = f(m)(x m). One may obtain X as m + Y/f(m). The new random variable has a mode at 0 of value 1. Call its density g. Devroye (1984a) showed that g(y) min ( 1, e 1 y ). 14

15 The integral of the upper bound is precisely 4. Thus, the rejection constant will be 4, uniformly over this vast class of densities. A random variable with density proportional to the upper bound is generated by first flipping a perfect coin. If it is heads, then generate (1 + E)S, where E is exponential, and S is another perfect coin. Otherwise generate U S, where U is uniform [0, 1] and independent of S. The algorithm is as follows: repeat Generate U (a uniform on [0, 1]), E (exponential), and S (a fair random bit). Generate a random sign S. If S = 0 then Y 1 + E else Y V, V uniform [0, 1] Set Y Y S until U min ( 1, e 1 Y ) g(y ) return Y Various improvements can be found in the original paper. In adaptive rejection sampling, Gilks and Wild (1992) bound the density adaptively from above and below by a number of exponential pieces, to reduce the integral under the bounding curve. If we denote the log concave density by f, and if f is available, then for all x, log-concavity implies that This is equivalent to log f(y) log f(x) + (y x) f (x) f(x), y. f(y) f(x)e (y x)f (x)/f(x), y. Assume that we have already computed f at k points. Then we can construct k + 1 exponential pieces (with breakpoints at those k points) that bound the density from above, and in fact, sampling from the piecewise exponential dominating curve is easily carried out by inversion. The integral under a piece anchored at x and spanning [x, x + δ) is f 2 (x) ( e δf (x)/f(x) 1 ). f (x) Gilks and Wild also bound f from below. Assume that y [x, x ], where x and x are two of those anchor points. Then log(f(y)) log(f(x)) + y x x x (log(f(x )) log(f(x))). The lower bound can be used to avoid the evaluation of f most of the time when accepting or rejecting. It is only when f is actually computed that a new point is added to the collection to increase the number of exponential pieces. Hörmann (1995) shows that it is best to add two exponential tails that touch g at the places where g(x) = 1/e, assuming that the derivative of g is available. In that case, the area under the bounding curve is reduced to e/(e 1) Hörmann and Derflinger (1994) and Hörmann (1995) (see also Leydold (2000a, 2000b, 2001) and Hörmann, Leydold and Derflinger (2004)) define the notion of T c -concavity, related to the concavity of 1/(f(x)) c, where 0 < c < 1. For example, the Student t distribution of parameter 1 is T 1/2 -concave. Similar universal methods related to the one above are developed by them. Rejection is done 15

16 from a curve with polynomial tails, and the rejection constant is ( 1 (1 c) 1/c 1) 1. The notion of T -convexity proposed by Evans and Swartz (1996, 2000) is useful for unbounded densities. Discrete distributions are called log concave if for all n, Some examples are shown in the table below. p 2 n p n 1 p n+1. Name Probability vector p k Parameter(s) Binomial (n, p) ) p k (1 p) n k (0 k n) n 1, 0 p 1 Poisson (λ) Negative binomial (n, p) ( n k ( n+k 1 n 1 λ k e λ (k 0) λ > 0 k! ) (1 p)k p n (k 0) n 1, 0 p 1 Geometric (p) p(1 p) k (k 0) 0 p 1 Logarithmic series (p) Hypergeometric (b, w, n) ( w k)( b p k k! log(1/(1 p)) n k) ( w+b n ) (k 1) 0 < p < 1 (0 k min(w, n)) n 1, b 0, w 0 Table 2: Some important discrete distributions and their parameter ranges. Others not shown here included the Pólya-Eggenberger distribution (see Johnson and Kotz, 1969) and the hyper-poisson distribution. For all discrete log-concave distributions having a mode at m, we have (Devroye, 1987a): p m+k p m min ( 1, e 1 pm k ), all k. Mimicking the development for densities, we may use rejection from a curve that is flat in the middle, and has two exponential tails. After taking care of rounding to the nearest integer, we obtain the following code, in which the expected number of iterations before halting is 4+p m 5 for all distributions in the family. 16

17 Compute w 1 + p m /2 (once) repeat Generate U, V, W uniformly on [0, 1], and let S be a random sign. If U w then Y V w/p 1+w m else Y (w log V )/p m X S round(y ) until W min (1, e w pmy ) p m+x /p m return m + X We can tighten the rejection constant further to 2 + p m for log concave distributions that are monotonically decreasing. The disadvantage in the code is the requirement that p m must be computed. The development for log-concave distributions can be aped for other classes. For example, the adaptive rejection method of Gilks and Wild can be used for unimodal densities with known mode m. One just needs to replace the exponential upper and lower bounds by histograms. This is covered in an exercise in Devroye (1986a). General algorithms are known for all Lipschitz densities with known bounds on the Lipschitz constant C and variance σ 2, and all unimodal densities with mode at m, and variance bounded by σ 2, to give two examples (see Devroye, 1984b, 1986a). 4. Indirect problems A certain amount of work is devoted to indirect problems, in which distributions are not described by their density, distribution function or discrete probability vector. A partial list follows, with brief descriptions on how the solutions may be constructed Characteristic functions. If the characteristic function ϕ(t) = { e itx} is known in black box format, very little can be done in a universal manner. In particular cases, we may have enough information to be able to deduce that a density exists and that it is bounded in a certain way. For example, from the inversion formula f(x) = 1 ϕ(t)e itx dt, 2π we deduce the bounds and Thus, we have sup f(x) M def = 1 x 2π ϕ(t) dt sup x 2 f(x) M def = 1 x 2π ϕ (t) dt f(x) min(m, M /x 2 ), a bound that is easily dealt with by rejection. In principle, then, rejection can be used, with f(x) approximated at will by integral approximations based on the inversion integral. This method 17

18 requires bounds on the integral approximation, and these depend in turn on the smoothness of ϕ. Explicit bounds on some smoothness parameters must be available. There are descriptions and worked out examples in Devroye (1981b, 1986b, 1988). Devroye (1988) uses this method to simulate the sum S n of n i.i.d. random variables with common characteristic function ϕ in expected time not depending upon n, as S n has characteristic function ϕ n. Pólya showed that any convex function ϕ on the positive halfline that decreases from 1 to 0 is the characteristic function of a symmetric random variable if we extend it to the real line by setting ϕ( t) = ϕ(t). These distributions correspond to random variables distributed as Y/Z, where Y and Z are independent, Y has the de la Vallée-Poussin density 1 2π ( ) 2 sin(x/2), x x/2 (for which random variate generation by rejection is trivial), and Z has distribution function on the positive halfline given by 1 ϕ(t) + tϕ (t). See Dugué and Girault (1955) for this property and Devroye (1984a) for its implications in random variate generation. Note that we have made a space warp from the complex to the real domain. This has unexpected corollaries. For example, if E 1, E 2 are independent exponential random variables, U is a uniform [0, 1] random variable, and α (0, 1], then Y/Z with Z = (E 1 + E 2 U<α ) 1 α is symmetric stable of parameter (α), with characteristic function e t α. distribution (Laha, 1961) with parameter α (0, 1] is described by Here Y/Z with Z = ϕ(t) = t α. ( α + 1 (α + 1) 2 4αU 2U 1 ) 1 α The Linnik-Laha has the desired distribution Fourier coefficients. Assume that the density f, suitably scaled and translated, is supported on [ π, π]. Then the Fourier coefficients are given by { } { } cos(kx) sin(kx) a k =, b k =, k 0. π π They uniquely describe the distribution. If the series absolutely converges, a k + b k <, k then the following trigonometric series is absolutely and uniformly convergent to f: f(x) = a (a k cos(kx) + b k sin(kx)). k=1 18

19 If we only use terms with index up to n, then the error made is not more than R n+1 = a 2 k + b 2 k. k=n+1 If we know bounds on R n, then one can use rejection with an alternating series method to generate random variates (Devroye, 1989). A particularly simple situation occurs when only the cosine coefficients are nonzero, as occurs for symmetric distributions. Assume furthermore that the Fourier cosine series coefficients a k are convex and a k 0 as k. Even without absolute convergence, the Fourier cosine series converges, and can in fact be rewritten as follows: f(x) = π(k + 1) 2 a k K k (x), where 2 a k = a k+2 2a k+1 + a k (a positive number, by convexity), and ( ) 2 1 sin((k + 1)x/2) K k (x) =, x π, 2π(k + 1) sin(x/2) k=0 is the Fejer kernel. As K k (x) min((k + 1)/4, π/(2(k + 1)x 2 ), random variates from it can be generated in expected time uniformly bounded in k by rejection. Thus, the following simple algorithm does the job: Generate U uniformly on [0, 1]. Z 0, S π 2 a 0. while U > S do Z Z + 1. S S + π(z + 1) 2 a Z. Generate X with Fejer density K Z. return X 4.3. The moments are known. Let µ n denote the n-th moment of a random variable X. One could ask to generate a random variate when for each n, µ n is computable (in a black box). Again, this is a nearly impossible problem to solve, unless additional information is available. For one thing, there may not be a unique solution. A sufficient condition that guarantees the uniqueness is Carleman s condition µ 2n 1 2n = n=0 (see Akhiezer, 1965). Sufficient conditions in terms of a density f exist, such as Krein s condition log(f(x)) 1 + x 2 dx =, combined with Lin s condition, applicable to symmetric and differentiable densities, which states that x f (x) /f(x) as x (Lin, 1997, Krein, 1944, Stoyanov, 2000). Whenever the distribution is of compact support, the moments determine the distribution. In fact, there are 19

20 various ways for reconstructing the density from the moments. An example is provided by the series f(x) = a k φ j (x), x 1, k=0 where φ k is the Legendre polynomial of degree k, and a k is a linear function of all moments up to the k-th. The series truncated at index n has an error not exceeding C r f (r+1) (1 x 2 ) 1/4 n r 1/2 where r is an integer and C r is a constant depending upon r only (Jackson, 1930). Rejection with dominating curve of the form C /(1 x 2 ) 1/4 (a symmetric beta) and with the alternating series method can thus lead to a generator (Devroye, 1989). Let The situation is much more interesting when X is supported on the positive integers. M r = {X(X 1) (X r + 1)} denote the r-th factorial moment. Then, approximate p j = p nj = 1 n ( 1) i M j+i. j! i! i=0 {X = j} by Note that p nj p j for n even, p nj p j for n odd, and p nj p j as n provided that {(1 + u) X } < for some u > 0. Under this condition, an alternating series rejection method can be implemented, with a dominating curve suggested by the crude bound p j min ( 1, µ r j r In fact, we may even attempt inversion, as the partial sums p 0 + p p j are available with any desired accuracy: it suffices to increase n until we are absolutely sure in which interval a given uniform random variate U lies. For more details, see Devroye (1991). ) The moment generating function. For random variables on the positive integers, as in the previous section, we may have knowledge of the moment generating function Here we have trivially, k(s) = p 0 + p 1 s + p 2 s 2 + = { s X}. p j k(s), s j where s > 0 can be picked at will. Approximations for p j can be constructed based on differences of higher orders. For example, if we take t > 0 arbitrary, then we can define j i=0 p nj = ( 1)j i( j i) k(it) j!t j and note that 0 p nj p j 1 1. (1 jt) j+1 This is sufficient to apply the alternating series method (Devroye, 1991). 20

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### Exercises with solutions (1)

Exercises with solutions (). Investigate the relationship between independence and correlation. (a) Two random variables X and Y are said to be correlated if and only if their covariance C XY is not equal

### Some Notes on Taylor Polynomials and Taylor Series

Some Notes on Taylor Polynomials and Taylor Series Mark MacLean October 3, 27 UBC s courses MATH /8 and MATH introduce students to the ideas of Taylor polynomials and Taylor series in a fairly limited

### 3. Continuous Random Variables

3. Continuous Random Variables A continuous random variable is one which can take any value in an interval (or union of intervals) The values that can be taken by such a variable cannot be listed. Such

### This document is published at www.agner.org/random, Feb. 2008, as part of a software package.

Sampling methods by Agner Fog This document is published at www.agner.org/random, Feb. 008, as part of a software package. Introduction A C++ class library of non-uniform random number generators is available

### 1 if 1 x 0 1 if 0 x 1

Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

### MATH 461: Fourier Series and Boundary Value Problems

MATH 461: Fourier Series and Boundary Value Problems Chapter III: Fourier Series Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2015 fasshauer@iit.edu MATH 461 Chapter

### Calculus C/Multivariate Calculus Advanced Placement G/T Essential Curriculum

Calculus C/Multivariate Calculus Advanced Placement G/T Essential Curriculum UNIT I: The Hyperbolic Functions basic calculus concepts, including techniques for curve sketching, exponential and logarithmic

### Lecture 4: Random Variables

Lecture 4: Random Variables 1. Definition of Random variables 1.1 Measurable functions and random variables 1.2 Reduction of the measurability condition 1.3 Transformation of random variables 1.4 σ-algebra

### PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.

PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include

### Senior Secondary Australian Curriculum

Senior Secondary Australian Curriculum Mathematical Methods Glossary Unit 1 Functions and graphs Asymptote A line is an asymptote to a curve if the distance between the line and the curve approaches zero

### 1. Periodic Fourier series. The Fourier expansion of a 2π-periodic function f is:

CONVERGENCE OF FOURIER SERIES 1. Periodic Fourier series. The Fourier expansion of a 2π-periodic function f is: with coefficients given by: a n = 1 π f(x) a 0 2 + a n cos(nx) + b n sin(nx), n 1 f(x) cos(nx)dx

### THE CENTRAL LIMIT THEOREM TORONTO

THE CENTRAL LIMIT THEOREM DANIEL RÜDT UNIVERSITY OF TORONTO MARCH, 2010 Contents 1 Introduction 1 2 Mathematical Background 3 3 The Central Limit Theorem 4 4 Examples 4 4.1 Roulette......................................

### The Heat Equation. Lectures INF2320 p. 1/88

The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

### Course 221: Analysis Academic year , First Semester

Course 221: Analysis Academic year 2007-08, First Semester David R. Wilkins Copyright c David R. Wilkins 1989 2007 Contents 1 Basic Theorems of Real Analysis 1 1.1 The Least Upper Bound Principle................

### Probability density function : An arbitrary continuous random variable X is similarly described by its probability density function f x = f X

Week 6 notes : Continuous random variables and their probability densities WEEK 6 page 1 uniform, normal, gamma, exponential,chi-squared distributions, normal approx'n to the binomial Uniform [,1] random

### x if x 0, x if x < 0.

Chapter 3 Sequences In this chapter, we discuss sequences. We say what it means for a sequence to converge, and define the limit of a convergent sequence. We begin with some preliminary results about the

### The Exponential Distribution

21 The Exponential Distribution From Discrete-Time to Continuous-Time: In Chapter 6 of the text we will be considering Markov processes in continuous time. In a sense, we already have a very good understanding

### Fourier Series. A Fourier series is an infinite series of the form. a + b n cos(nωx) +

Fourier Series A Fourier series is an infinite series of the form a b n cos(nωx) c n sin(nωx). Virtually any periodic function that arises in applications can be represented as the sum of a Fourier series.

### Inner Product Spaces

Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

### FIRST YEAR CALCULUS. Chapter 7 CONTINUITY. It is a parabola, and we can draw this parabola without lifting our pencil from the paper.

FIRST YEAR CALCULUS WWLCHENW L c WWWL W L Chen, 1982, 2008. 2006. This chapter originates from material used by the author at Imperial College, University of London, between 1981 and 1990. It It is is

### Learning Objectives for Math 165

Learning Objectives for Math 165 Chapter 2 Limits Section 2.1: Average Rate of Change. State the definition of average rate of change Describe what the rate of change does and does not tell us in a given

### t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

### Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 1 Joint Probability Distributions 1 1.1 Two Discrete

### Sums of Independent Random Variables

Chapter 7 Sums of Independent Random Variables 7.1 Sums of Discrete Random Variables In this chapter we turn to the important question of determining the distribution of a sum of independent random variables

### Understanding Basic Calculus

Understanding Basic Calculus S.K. Chung Dedicated to all the people who have helped me in my life. i Preface This book is a revised and expanded version of the lecture notes for Basic Calculus and other

### Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

### 1 Limiting distribution for a Markov chain

Copyright c 2009 by Karl Sigman Limiting distribution for a Markov chain In these Lecture Notes, we shall study the limiting behavior of Markov chains as time n In particular, under suitable easy-to-check

### Fourier Series. Chapter Some Properties of Functions Goal Preliminary Remarks

Chapter 3 Fourier Series 3.1 Some Properties of Functions 3.1.1 Goal We review some results about functions which play an important role in the development of the theory of Fourier series. These results

### Lectures 5-6: Taylor Series

Math 1d Instructor: Padraic Bartlett Lectures 5-: Taylor Series Weeks 5- Caltech 213 1 Taylor Polynomials and Series As we saw in week 4, power series are remarkably nice objects to work with. In particular,

### Limit processes are the basis of calculus. For example, the derivative. f f (x + h) f (x)

SEC. 4.1 TAYLOR SERIES AND CALCULATION OF FUNCTIONS 187 Taylor Series 4.1 Taylor Series and Calculation of Functions Limit processes are the basis of calculus. For example, the derivative f f (x + h) f

### k=1 k2, and therefore f(m + 1) = f(m) + (m + 1) 2 =

Math 104: Introduction to Analysis SOLUTIONS Alexander Givental HOMEWORK 1 1.1. Prove that 1 2 +2 2 + +n 2 = 1 n(n+1)(2n+1) for all n N. 6 Put f(n) = n(n + 1)(2n + 1)/6. Then f(1) = 1, i.e the theorem

### CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e. This chapter contains the beginnings of the most important, and probably the most subtle, notion in mathematical analysis, i.e.,

### RECURSIVE TREES. Michael Drmota. Institue of Discrete Mathematics and Geometry Vienna University of Technology, A 1040 Wien, Austria

RECURSIVE TREES Michael Drmota Institue of Discrete Mathematics and Geometry Vienna University of Technology, A 040 Wien, Austria michael.drmota@tuwien.ac.at www.dmg.tuwien.ac.at/drmota/ 2008 Enrage Topical

### Taylor and Maclaurin Series

Taylor and Maclaurin Series In the preceding section we were able to find power series representations for a certain restricted class of functions. Here we investigate more general problems: Which functions

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### Metric Spaces. Chapter 7. 7.1. Metrics

Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some

### The Dirichlet Unit Theorem

Chapter 6 The Dirichlet Unit Theorem As usual, we will be working in the ring B of algebraic integers of a number field L. Two factorizations of an element of B are regarded as essentially the same if

### Fourier Series Chapter 3 of Coleman

Fourier Series Chapter 3 of Coleman Dr. Doreen De eon Math 18, Spring 14 1 Introduction Section 3.1 of Coleman The Fourier series takes its name from Joseph Fourier (1768-183), who made important contributions

### Fourier Series and Sturm-Liouville Eigenvalue Problems

Fourier Series and Sturm-Liouville Eigenvalue Problems 2009 Outline Functions Fourier Series Representation Half-range Expansion Convergence of Fourier Series Parseval s Theorem and Mean Square Error Complex

### correct-choice plot f(x) and draw an approximate tangent line at x = a and use geometry to estimate its slope comment The choices were:

Topic 1 2.1 mode MultipleSelection text How can we approximate the slope of the tangent line to f(x) at a point x = a? This is a Multiple selection question, so you need to check all of the answers that

### Chapter 7. Induction and Recursion. Part 1. Mathematical Induction

Chapter 7. Induction and Recursion Part 1. Mathematical Induction The principle of mathematical induction is this: to establish an infinite sequence of propositions P 1, P 2, P 3,..., P n,... (or, simply

### Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

### Fourier series. Jan Philip Solovej. English summary of notes for Analysis 1. May 8, 2012

Fourier series Jan Philip Solovej English summary of notes for Analysis 1 May 8, 2012 1 JPS, Fourier series 2 Contents 1 Introduction 2 2 Fourier series 3 2.1 Periodic functions, trigonometric polynomials

### Mathematical Methods of Engineering Analysis

Mathematical Methods of Engineering Analysis Erhan Çinlar Robert J. Vanderbei February 2, 2000 Contents Sets and Functions 1 1 Sets................................... 1 Subsets.............................

### Mathematics (MAT) Faculty Mary Hayward, Chair Charles B. Carey Garnet Hauger, Affiliate Tim Wegner

Mathematics (MAT) Faculty Mary Hayward, Chair Charles B. Carey Garnet Hauger, Affiliate Tim Wegner About the discipline The number of applications of mathematics has grown enormously in the natural, physical

### Tangent and normal lines to conics

4.B. Tangent and normal lines to conics Apollonius work on conics includes a study of tangent and normal lines to these curves. The purpose of this document is to relate his approaches to the modern viewpoints

### 4. Joint Distributions

Virtual Laboratories > 2. Distributions > 1 2 3 4 5 6 7 8 4. Joint Distributions Basic Theory As usual, we start with a random experiment with probability measure P on an underlying sample space. Suppose

### CALCULUS 2. 0 Repetition. tutorials 2015/ Find limits of the following sequences or prove that they are divergent.

CALCULUS tutorials 5/6 Repetition. Find limits of the following sequences or prove that they are divergent. a n = n( ) n, a n = n 3 7 n 5 n +, a n = ( n n 4n + 7 ), a n = n3 5n + 3 4n 7 3n, 3 ( ) 3n 6n

### MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

### The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real

### 3. INNER PRODUCT SPACES

. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

### Lecture VI. Review of even and odd functions Definition 1 A function f(x) is called an even function if. f( x) = f(x)

ecture VI Abstract Before learning to solve partial differential equations, it is necessary to know how to approximate arbitrary functions by infinite series, using special families of functions This process

### Transformations and Expectations of random variables

Transformations and Epectations of random variables X F X (): a random variable X distributed with CDF F X. Any function Y = g(x) is also a random variable. If both X, and Y are continuous random variables,

### M2S1 Lecture Notes. G. A. Young http://www2.imperial.ac.uk/ ayoung

M2S1 Lecture Notes G. A. Young http://www2.imperial.ac.uk/ ayoung September 2011 ii Contents 1 DEFINITIONS, TERMINOLOGY, NOTATION 1 1.1 EVENTS AND THE SAMPLE SPACE......................... 1 1.1.1 OPERATIONS

### MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

MATH 45, PRACTICE FINAL EXAM SOLUTIONS. Exercise. a Is the operator L defined on smooth functions of x, y by L u := u xx + cosu linear? b Does the answer change if we replace the operator L by the operator

### A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses Michael R. Powers[ ] Temple University and Tsinghua University Thomas Y. Powers Yale University [June 2009] Abstract We propose a

### Infinite series, improper integrals, and Taylor series

Chapter Infinite series, improper integrals, and Taylor series. Introduction This chapter has several important and challenging goals. The first of these is to understand how concepts that were discussed

### Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics

Undergraduate Notes in Mathematics Arkansas Tech University Department of Mathematics An Introductory Single Variable Real Analysis: A Learning Approach through Problem Solving Marcel B. Finan c All Rights

### General theory of stochastic processes

CHAPTER 1 General theory of stochastic processes 1.1. Definition of stochastic process First let us recall the definition of a random variable. A random variable is a random number appearing as a result

### MOP 2007 Black Group Integer Polynomials Yufei Zhao. Integer Polynomials. June 29, 2007 Yufei Zhao yufeiz@mit.edu

Integer Polynomials June 9, 007 Yufei Zhao yufeiz@mit.edu We will use Z[x] to denote the ring of polynomials with integer coefficients. We begin by summarizing some of the common approaches used in dealing

### INTERPOLATION. Interpolation is a process of finding a formula (often a polynomial) whose graph will pass through a given set of points (x, y).

INTERPOLATION Interpolation is a process of finding a formula (often a polynomial) whose graph will pass through a given set of points (x, y). As an example, consider defining and x 0 =0, x 1 = π 4, x

### 9 More on differentiation

Tel Aviv University, 2013 Measure and category 75 9 More on differentiation 9a Finite Taylor expansion............... 75 9b Continuous and nowhere differentiable..... 78 9c Differentiable and nowhere monotone......

### 2WB05 Simulation Lecture 8: Generating random variables

2WB05 Simulation Lecture 8: Generating random variables Marko Boon http://www.win.tue.nl/courses/2wb05 January 7, 2013 Outline 2/36 1. How do we generate random variables? 2. Fitting distributions Generating

### Random variables, probability distributions, binomial random variable

Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that

### MATHEMATICS (CLASSES XI XII)

MATHEMATICS (CLASSES XI XII) General Guidelines (i) All concepts/identities must be illustrated by situational examples. (ii) The language of word problems must be clear, simple and unambiguous. (iii)

### Limits and convergence.

Chapter 2 Limits and convergence. 2.1 Limit points of a set of real numbers 2.1.1 Limit points of a set. DEFINITION: A point x R is a limit point of a set E R if for all ε > 0 the set (x ε,x + ε) E is

### An Introduction to Separation of Variables with Fourier Series Math 391w, Spring 2010 Tim McCrossen Professor Haessig

An Introduction to Separation of Variables with Fourier Series Math 391w, Spring 2010 Tim McCrossen Professor Haessig Abstract: This paper aims to give students who have not yet taken a course in partial

### Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm.

Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm. We begin by defining the ring of polynomials with coefficients in a ring R. After some preliminary results, we specialize

### THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear

### Module MA1S11 (Calculus) Michaelmas Term 2016 Section 3: Functions

Module MA1S11 (Calculus) Michaelmas Term 2016 Section 3: Functions D. R. Wilkins Copyright c David R. Wilkins 2016 Contents 3 Functions 43 3.1 Functions between Sets...................... 43 3.2 Injective

### CHAPTER 3. Sequences. 1. Basic Properties

CHAPTER 3 Sequences We begin our study of analysis with sequences. There are several reasons for starting here. First, sequences are the simplest way to introduce limits, the central idea of calculus.

### Roots of Polynomials

Roots of Polynomials (Com S 477/577 Notes) Yan-Bin Jia Sep 24, 2015 A direct corollary of the fundamental theorem of algebra is that p(x) can be factorized over the complex domain into a product a n (x

### 6. Distribution and Quantile Functions

Virtual Laboratories > 2. Distributions > 1 2 3 4 5 6 7 8 6. Distribution and Quantile Functions As usual, our starting point is a random experiment with probability measure P on an underlying sample spac

### Solution Using the geometric series a/(1 r) = x=1. x=1. Problem For each of the following distributions, compute

Math 472 Homework Assignment 1 Problem 1.9.2. Let p(x) 1/2 x, x 1, 2, 3,..., zero elsewhere, be the pmf of the random variable X. Find the mgf, the mean, and the variance of X. Solution 1.9.2. Using the

### MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

### Important Probability Distributions OPRE 6301

Important Probability Distributions OPRE 6301 Important Distributions... Certain probability distributions occur with such regularity in real-life applications that they have been given their own names.

### Sequences and Series

Sequences and Series Consider the following sum: 2 + 4 + 8 + 6 + + 2 i + The dots at the end indicate that the sum goes on forever. Does this make sense? Can we assign a numerical value to an infinite

### INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

### Figure 2.1: Center of mass of four points.

Chapter 2 Bézier curves are named after their inventor, Dr. Pierre Bézier. Bézier was an engineer with the Renault car company and set out in the early 196 s to develop a curve formulation which would

### MATH 132: CALCULUS II SYLLABUS

MATH 32: CALCULUS II SYLLABUS Prerequisites: Successful completion of Math 3 (or its equivalent elsewhere). Math 27 is normally not a sufficient prerequisite for Math 32. Required Text: Calculus: Early

### MATH31011/MATH41011/MATH61011: FOURIER ANALYSIS AND LEBESGUE INTEGRATION. Chapter 4: Fourier Series and L 2 ([ π, π], µ) ( 1 π

MATH31011/MATH41011/MATH61011: FOURIER ANALYSIS AND LEBESGUE INTEGRATION Chapter 4: Fourier Series and L ([, π], µ) Square Integrable Functions Definition. Let f : [, π] R be measurable. We say that f

### Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

### ECEN 5682 Theory and Practice of Error Control Codes

ECEN 5682 Theory and Practice of Error Control Codes Convolutional Codes University of Colorado Spring 2007 Linear (n, k) block codes take k data symbols at a time and encode them into n code symbols.

### Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

### Chapter 4 Expected Values

Chapter 4 Expected Values 4. The Expected Value of a Random Variables Definition. Let X be a random variable having a pdf f(x). Also, suppose the the following conditions are satisfied: x f(x) converges

### IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem Time on my hands: Coin tosses. Problem Formulation: Suppose that I have

### Problem 1 (10 pts) Find the radius of convergence and interval of convergence of the series

1 Problem 1 (10 pts) Find the radius of convergence and interval of convergence of the series a n n=1 n(x + 2) n 5 n 1. n(x + 2)n Solution: Do the ratio test for the absolute convergence. Let a n =. Then,

### Fourier Series, Integrals, and Transforms

Chap. Sec.. Fourier Series, Integrals, and Transforms Fourier Series Content: Fourier series (5) and their coefficients (6) Calculation of Fourier coefficients by integration (Example ) Reason hy (6) gives

### Functions and Equations

Centre for Education in Mathematics and Computing Euclid eworkshop # Functions and Equations c 014 UNIVERSITY OF WATERLOO Euclid eworkshop # TOOLKIT Parabolas The quadratic f(x) = ax + bx + c (with a,b,c

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Sequences and Convergence in Metric Spaces

Sequences and Convergence in Metric Spaces Definition: A sequence in a set X (a sequence of elements of X) is a function s : N X. We usually denote s(n) by s n, called the n-th term of s, and write {s

### The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

### Gambling Systems and Multiplication-Invariant Measures

Gambling Systems and Multiplication-Invariant Measures by Jeffrey S. Rosenthal* and Peter O. Schwartz** (May 28, 997.. Introduction. This short paper describes a surprising connection between two previously

### Week 9-10: Recurrence Relations and Generating Functions

Week 9-10: Recurrence Relations and Generating Functions March 31, 2015 1 Some number sequences An infinite sequence (or just a sequence for short is an ordered array a 0, a 1, a 2,..., a n,... of countably

### An important theme in this book is to give constructive definitions of mathematical objects. Thus, for instance, if you needed to evaluate.

Chapter 10 Series and Approximations An important theme in this book is to give constructive definitions of mathematical objects. Thus, for instance, if you needed to evaluate 1 0 e x2 dx, you could set

### 1 Mathematical Induction

Extra Credit Homework Problems Note: these problems are of varying difficulty, so you might want to assign different point values for the different problems. I have suggested the point values each problem