II.D Many Random Variables

Similar documents
This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

E3: PROBABILITY AND STATISTICS lecture notes

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Topic 3b: Kinetic Theory

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Review of Fundamental Mathematics

South Carolina College- and Career-Ready (SCCCR) Pre-Calculus

The Matrix Elements of a 3 3 Orthogonal Matrix Revisited

Continued Fractions and the Euclidean Algorithm

Copy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any.

Lecture L3 - Vectors, Matrices and Coordinate Transformations

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the school year.

Creating, Solving, and Graphing Systems of Linear Equations and Linear Inequalities

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Linear Threshold Units

Math 120 Final Exam Practice Problems, Form: A

Inner Product Spaces

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Random variables, probability distributions, binomial random variable

Algebra I Vocabulary Cards

a 1 x + a 0 =0. (3) ax 2 + bx + c =0. (4)

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

1 Prior Probability and Posterior Probability

Basics of Statistical Machine Learning

Chapter 17. Orthogonal Matrices and Symmetries of Space

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

4. Continuous Random Variables, the Pareto and Normal Distributions

The Quantum Harmonic Oscillator Stephen Webb

An Introduction to Basic Statistics and Probability

Solution to Homework 2

Data Mining: Algorithms and Applications Matrix Math Review

Math 115 Spring 2011 Written Homework 5 Solutions

Microeconomics Sept. 16, 2010 NOTES ON CALCULUS AND UTILITY FUNCTIONS

Elasticity Theory Basics

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

Understanding Basic Calculus

26 Integers: Multiplication, Division, and Order

Stirling s formula, n-spheres and the Gamma Function

Linear Classification. Volker Tresp Summer 2015

Least-Squares Intersection of Lines

Statistical Machine Learning

6. Define log(z) so that π < I log(z) π. Discuss the identities e log(z) = z and log(e w ) = w.

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

4 Lyapunov Stability Theory

1 The Brownian bridge construction

Probability Calculator

Quotient Rings and Field Extensions

State of Stress at Point

A.2. Exponents and Radicals. Integer Exponents. What you should learn. Exponential Notation. Why you should learn it. Properties of Exponents

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

1 Determinants and the Solvability of Linear Systems

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Computational Geometry Lab: FEM BASIS FUNCTIONS FOR A TETRAHEDRON

5. Continuous Random Variables

Notes on Continuous Random Variables

Principle of Data Reduction

Differential Relations for Fluid Flow. Acceleration field of a fluid. The differential equation of mass conservation

CHAPTER 6. Shannon entropy

DIFFERENTIABILITY OF COMPLEX FUNCTIONS. Contents

Algebra II End of Course Exam Answer Key Segment I. Scientific Calculator Only

Unified Lecture # 4 Vectors

Efficiency and the Cramér-Rao Inequality

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010

DRAFT. Further mathematics. GCE AS and A level subject content

Notes on Determinant

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Electrostatic Fields: Coulomb s Law & the Electric Field Intensity

CHAPTER FIVE. Solutions for Section 5.1. Skill Refresher. Exercises

Overview. Essential Questions. Precalculus, Quarter 4, Unit 4.5 Build Arithmetic and Geometric Sequences and Series

The derivation of the balance equations

Big Ideas in Mathematics

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability

Practice problems for Homework 11 - Point Estimation

CHAPTER 2 Estimating Probabilities

Linear Algebra I. Ronald van Luijk, 2012

Vocabulary Words and Definitions for Algebra

Vector and Matrix Norms

Taylor and Maclaurin Series

South Carolina College- and Career-Ready (SCCCR) Algebra 1

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

STA 4273H: Statistical Machine Learning

Mathematics Online Instructional Materials Correlation to the 2009 Algebra I Standards of Learning and Curriculum Framework

Partial Fractions. Combining fractions over a common denominator is a familiar operation from algebra:

Higher Education Math Placement

Zeros of a Polynomial Function

Figure 1.1 Vector A and Vector F

Chapter 3. Distribution Problems. 3.1 The idea of a distribution The twenty-fold way

Introduction to Matrix Algebra

4 Sums of Random Variables

Transcription:

II.D Many Random Variables With more than one random variable, the set of outcomes is an -dimensional space, S x = { < x 1, x,, x < }. For example, describing the location and velocity of a gas particle requires six coordinates. The joint PDF p(x), is the probability density of an outcome in a volume element d x = dx i around the point x = {x 1, x,, x }. The joint PDF is normalized such that p x (S) = d x p(x) = 1. (II.3) If, and only if, the random variables are independent, the joint PDF is the product of individual PDFs, p(x) = p i (x i ). (II.33) The unconditional PDF describes the behavior of a subset of random variables, independent of the values of the others. For example, if we are interested only in the location of a gas particle, an unconditional PDF can be constructed by integrating over all velocities at a given location, p( x) = d 3 v p( x, v); more generally p(x 1,, x m ) = dx i p(x 1,, x ). (II.34) i=m+1 The conditional PDF describes the behavior of a subset of random variables, for specified values of the others. For example, the PDF for the velocity of a particle at a particular location x, denoted by p( v x), is proportional to the joint PDF p( v x) = p( x, v)/. The constant of proportionality, obtained by normalizing p( v x), is = d 3 v p( x, v) = p( x), (II.35) the unconditional PDF for a particle at x. In general, the unconditional PDFs are obtained from Bayes Theorem as p(x 1,, x m x m+1,, x ) = p(x 1,, x ) p(x m+1,, x ). (II.36) ote that if the random variables are independent, the unconditional PDF is equal to the conditional PDF. 3

The expectation value of a function F(x), is obtained as before from F(x) = d x p(x)f(x). (II.37) The joint characteristic function, is obtained from the -dimensional Fourier transformation of the joint PDF, p (k) = exp i k j x j. (II.38) The joint moments and joint cumulants are generated by p(k) and ln p(k) respectively, as [ ] n1 [ ] n [ ] n n x 1 n 1 x n x = p (k = 0), ( ik 1 ) ( ik ) ( ik ) [ ] n1 [ ] n [ ] n (II.39) n x 1 n 1 x n x c = ln p(k = 0). ( ik 1 ) ( ik ) ( ik ) The previously described graphical relation between joint moments (all clusters of labelled points), and joint cumulant (connected clusters) is still applicable. For example [ ] 1 1 p(x) = (π) det[c] exp C 1 (x mn m λ m )(x n λ n ), (II.41) by similar manipulations, and is given by [ ] 1 p (k) = exp ik m λ m C mn k m k n, (II.4) j=1 x α x β = x α c x β c + x α x β c, and (II.40) x α x β = x α c x β c + x α x c β c + x α x β c x α c + x α x β c. The connected correlation, x α x β c, is zero if x α and x β are independent random variables. The joint Gaussian distribution is the generalization of eq.(ii.15) to random variables, as mn where C is a symmetric matrix, and C 1 is its inverse,. The simplest way to get the normalization factor is to make a linear transformation from the variables y j = x j λ j, using the unitary matrix that diagonalizes C. This reduces the normalization to that of the product of Gaussians whose variances are determined by the eigenvalues of C. The product of the eigenvalues is the determinant det[c]. (This also indicates that the matrix C must be positive definite.) The corresponding joint characteristic function is obtained 33

where the summation convention is used. The joint cumulants of the Gaussian are then obtained from ln p(k) as x m c = λ m, x m x n c = C mn, (II.43) with all higher cumulants equal to zero. In the special case of {λ m } = 0, all odd moments of the distribution are zero, while the general rules for relating moments to cumulants indicate that any even moment is obtained by summing over all ways of grouping the involved random variables into pairs, e.g. x a x b x c x d = C ab C cd + C ac C bd + C ad C bc. (II.44) This result is sometimes referred to as Wick s theorem. II.E Sums of Random Variables & the Central Limit Theorem Consider the sum X = x i, where x i are random variables with a joint PDF of p(x). The PDF for X is ( ) 1 p X (x) = d x p(x)δ x x i = dx i p (x 1,, x 1, x x 1 x 1 ), (II.45) and the corresponding characteristic function (using eq.(ii.38)) is given by p X(k) = exp ik x j = p (k 1 = k = = k = k). (II.46) j=1 Cumulants of the sum are obtained by expanding ln p X (k), ln p (k 1 = k = = k = k) = ik xi1 c + ( ik) x xi1 i c +, (II.47) i 1 =1 i 1,i as X c = x i c, X = x c i x j c,. (II.48) i,j If the random variables are independent, p(x) = p i (x i ), and p X (k) = p i(k). The cross cumulants in eq.(ii.48) vanish, and the n th cumulant of X is simply the sum of the individual cumulants, X n c = x i n c. When all the random variables 34

are independently taken from the same distribution p(x), this implies X n c = x n c, generalizing the result obtained previously for the binomial distribution. For large values of, the average value of the sum is proportional to, while fluctuations around the mean, as measured by the standard deviation, grow only as. The random variable n y = (X x 1 m/ c )/, has zero mean, and cumulants that scale as y c. As, only the second cumulant survives, and the PDF for y converges to the normal distribution, ( ) lim p y = x i x c 1 y = exp. (II.49) π x c x c (ote that the Gaussian distribution is the only distribution with only first and second cumulants.) The convergence of the PDF for the sum of many random variables to a normal distribution is a most important result in the context of statistical mechanics where such sums are frequently encountered. The central limit theorem states a more general form of this result: It is not necessary for the random variables to be independent, as the condition i 1,,i m x i1 x im c O( m/ ), is sufficient for the validity of eq.(ii.49). II.F Rules for Large umbers To describe equilibrium properties of macroscopic bodies, statistical mechanics has to deal with the very large number, of microscopic degrees of freedom. Actually, taking the thermodynamic limit of leads to a number of simplifications, some of which are described in this section. There are typically three types of dependence encountered in the thermodynamic limit: (a) Intensive quantities, such as temperature T, and generalized forces, e.g. pressure P, and magnetic field B, are independent of, i.e. O( 0 ). (b) Extensive quantities, such as energy E, entropy S, and generalized displacements, e.g. volume V, and magnetization M, are proportional to, i.e. O( 1 ). (c) Exponential dependence, i.e. O exp(φ), is encountered in enumerating discrete micro-states, or computing available volumes in phase space. Other asymptotic dependencies are certainly not ruled out a priori. For example, the Coulomb energy of ions at fixed density scales as Q /R 5/3. Such dependencies are rarely encountered in every day physics. The Coulomb interaction of ions is quickly 35

screened by counter-ions, resulting in an extensive overall energy. (This is not the case in astrophysical problems since the gravitational energy can not be screened. For example the entropy of a black hole is proportional to the square of its mass.) In statistical mechanics we frequently encounter sums or integrals of exponential variables. Performing such sums in the thermodynamic limit is considerably simplified due to the following results. (1) Summation of Exponential Quantities Consider the sum S = E i, (II.50) where each term is positive, with an exponential dependence on, 0 E i O exp(φ i ), (II.51) and the number of terms, is proportional to some power of. Such a sum can be approximated by its largest term E max, in the following sense. Since for each term in the sum, 0 E i E max, E max S E max. (II.5) An intensive quantity can be constructed from ln S/, which is bounded by ln Emax ln S ln E + max ln. (II.53) For p, the ratio ln / vanishes in the large limit, and ln S lim = ln E max = φ max. (II.54) () Saddle Point Integration Similarly, an integral of the form I = dx exp φ(x), (II.55) can be approximated by the maximum value of the integrand, obtained at a point x max which maximizes the exponent φ(x). Expanding around this point, { [ ]} 1 I = dx exp φ(x max ) φ (x max ) (x x max ) +. (II.56) 36

ote that at the maximum, the first derivative φ (x max ), is zero, while the second derivative φ (x max ), is negative. Terminating the series at the quadratic order results in [ ] φ(x I e max ) dx exp φ (x max ) (x x max ) π φ ( x max ) e φ(x max ), (II.57) where the range of integration has been extended to [, ]. The latter is justified since the integrand is negligibly small outside the neighborhood of x max. There are two types of correction to the above result. Firstly, there are higher order terms in the expansion of φ(x) around x max. These corrections can be looked at perturbatively, and lead to a series in powers of 1/. Secondly, there may be addi tional local maxima for the function. A maximum at x max, leads to a similar Gaussian integral that can be added to eq.(ii.57). Clearly such contributions are smaller by O exp{ [φ(x max ) φ(x )]}. Since all these corrections vanish in the thermodynamic limit, max [ ] ln I 1 ( φ (x max ) ) 1 lim = lim φ(x max ) ln + O = φ(x max ). (II.58) π The saddle point method for evaluating integrals is the extension of the above result to more general integrands, and integration paths in the complex plane. (The appropriate extremum in the complex plane is a saddle point.) The simplified version presented above is sufficient for the purposes of this course. Stirling s approximation for! at large can be obtained by saddle point integration. In order to get an integral representation of!, start with the result dxe αx = 1. α 0 (II.59) Repeated differentiation of both sides of the above equation with respect to α leads to dxx αx e 0 =!. (II.60) α+1 Although the above result only applies to integer, it is possible to define by analytical continuation a function, Γ( + 1)! = 37 0 dxx x e, (II.61)

for all. While the integral in eq.(ii.61) is not exactly in the form of eq.(ii.55), it can still be evaluated by a similar method. The integrand can be written as exp φ(x), with φ(x) = lnx x/. The exponent has a maximum at x max =, with φ(x max ) = ln 1, and φ (x max ) = 1/. Expanding the integrand in eq.(ii.61) around this point yields, ( 1! dx exp ln (x )) e π, (II.6) where the integral is evaluated by extending its limits to [, ]. Stirling s formula is obtained by taking the logarithm of eq.(ii.6) as, 1 1 ln! = ln + ln(π) + O. (II.63) II.G Information, Entropy, and Estimation Information: Consider a random variable with a discrete set of outcomes S = {x i }, occurring with probabilities {p(i)}, for i = 1,, M. In the context of information theory, there is a precise meaning to the information content of a probability distribution: Let us construct a message from independent outcomes of the random variable. Since there are M possibilities for each character in this message, it has an apparent information content of ln M bits; i.e. this many binary bits of information have to be transmitted to convey the message precisely. On the other hand, the probabilities {p(i)} limit the types of messages that are likely. For example, if p p 1, it is very unlikely to construct a message with more x 1 than x. In particular, in the limit of large, we expect the message to contain roughly { i = p i } occurrences of each symbol. The number of typical messages thus corresponds to the number of ways of rearranging the { i } occurrences of {x i }, and is given by the multinomial coefficient! g = M. (II.64) i! This is much smaller than the total number of messages M n. To specify one out of g possible sequences requires M ln g p i ln p i (for ), (II.65) More precisely, the probability of finding any i that is different from p i by more than ± becomes exponentially small in, as. 38

bits of information. The last result is obtained by applying Stirling s approximation for ln!. It can also be obtained by noting that M i M p 1 = pi =! i g p p i i, (II.66) i! i { i } where the sum has been replaced by its largest term, as justified in the previous section. Shannon s Theorem proves more rigorously that the minimum number of bits necessary to ensure that the percentage of errors in trials vanishes in the limit, is ln g. For any non-uniform distribution, this is less than the ln M bits needed in the absence of any information on relative probabilities. The difference per trial is thus attributed to the information content of the probability distribution, and is given by M I [{p i }] = ln M + p i ln p i. (II.67) Entropy: Eq.(II.64) is encountered frequently in statistical mechanics in the context of mixing M distinct components; its natural logarithm is related to the entropy of mixing. More generally, we can define an entropy for any probability distribution as M S = p(i) lnp(i) = ln p(i). (II.68) The above entropy takes a minimum value of zero for the delta function distribution p(i) = δ i,j, and a maximum value of lnm for the uniform distribution, p(i) = 1/M. S is thus a measure of dispersity (disorder) of the distribution, and does not depend on the values of the random variables {x i }. A one to one mapping to f i = F(x i ) leaves the entropy unchanged, while a many to one mapping makes the distribution more ordered and decrease S. For example, if the two values, x 1 and x, are mapped onto the same f, the change in entropy is [ ] p 1 p ΔS(x 1, x f) = p 1 ln + p ln < 0. (II.69) p 1 + p p 1 + p Estimation: The entropy S, can also be used to quantify subjective estimates of probabilities. In the absence of any information, the best unbiased estimate is that all M outcomes are equally likely. This is the distribution of maximum entropy. If additional information is available, the unbiased estimate is obtained by maximizing the entropy subject to the 39

constraints imposed by this information. For example, if it is known that F(x) = f, we can maximize S (α, β, {p i }) = p(i) lnp(i) α p(i) 1 β p(i)f(x i ) f, (II.70) i i i where the Lagrange multipliers α and β are introduced to impose the constraints of normalization, and F(x) = f, respectively. The result of the optimization is a distribution p i exp βf(x i ), where the value of β is fixed by the constraint. This process can be generalized to an arbitrary number of conditions. It is easy to see that if the first n moments (and hence n cumulants) of a distribution are specified, the unbiased estimate is the exponential of an n th order polynomial. In analogy with eq.(ii.68), we can define an entropy for a continuous random variable (S x = { < x < }) as S = dx p(x) lnp(x) = ln p(x). (II.71) There are, however, problems with this definition, as for example S is not invariant under a one to one mapping. (After a change of variable to f = F(x), the entropy is changed by F (x).) Since the Jacobian of a canonical transformation is unity, canonically conjugate pairs offer a suitable choice of coordinates in classical statistical mechanics. The ambiguities are also removed if the continuous variable is discretized. This happens quite naturally in quantum statistical mechanics where it is usually possible to work with a discrete ladder of states. The appropriate volume for discretization of phase space is set by Planck s constant h. 40

MIT OpenCourseWare http://ocw.mit.edu 8.333 Statistical Mechanics I: Statistical Mechanics of Particles Fall 013 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.