The Ergodic Theorem and randomness



Similar documents
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

RANDOM INTERVAL HOMEOMORPHISMS. MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Notes on metric spaces

Lecture 2: Universality

x a x 2 (1 + x 2 ) n.

CS154. Turing Machines. Turing Machine. Turing Machines versus DFAs FINITE STATE CONTROL AI N P U T INFINITE TAPE. read write move.

Gambling Systems and Multiplication-Invariant Measures

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics

Separation Properties for Locally Convex Cones

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents

Polarization codes and the rate of polarization

Maximum Likelihood Estimation

Probability Theory. Florian Herzog. A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T..

1. Prove that the empty set is a subset of every set.

Computability Theory

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

CHAPTER 6. Shannon entropy

0 <β 1 let u(x) u(y) kuk u := sup u(x) and [u] β := sup

MEASURE AND INTEGRATION. Dietmar A. Salamon ETH Zürich

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.

Pacific Journal of Mathematics

Lecture Notes on Elasticity of Substitution

4 Lyapunov Stability Theory

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

ALMOST COMMON PRIORS 1. INTRODUCTION

Deterministic Finite Automata

Metric Spaces. Chapter Metrics

and s n (x) f(x) for all x and s.t. s n is measurable if f is. REAL ANALYSIS Measures. A (positive) measure on a measurable space

SOME APPLICATIONS OF MARTINGALES TO PROBABILITY THEORY

CHAPTER IV - BROWNIAN MOTION

Lecture 4: BK inequality 27th August and 6th September, 2007

Estimating the Degree of Activity of jumps in High Frequency Financial Data. joint with Yacine Aït-Sahalia

Techniques algébriques en calcul quantique

(IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems.

SOLUTIONS TO EXERCISES FOR. MATHEMATICS 205A Part 3. Spaces with special properties

Gambling and Data Compression

Lecture Notes on Measure Theory and Functional Analysis

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Binomial lattice model for stock prices

BANACH AND HILBERT SPACE REVIEW

1 if 1 x 0 1 if 0 x 1

arxiv: v1 [math.pr] 5 Dec 2011

On strong fairness in UNITY

An example of a computable

n k=1 k=0 1/k! = e. Example 6.4. The series 1/k 2 converges in R. Indeed, if s n = n then k=1 1/k, then s 2n s n = 1 n

6.2 Permutations continued

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy

Continuity of the Perron Root

Omega Automata: Minimization and Learning 1

No: Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

An Introduction to Competitive Analysis for Online Optimization. Maurice Queyranne University of British Columbia, and IMA Visitor (Fall 2002)

Date: April 12, Contents

On the Unique Games Conjecture

CS 3719 (Theory of Computation and Algorithms) Lecture 4

Stochastic Inventory Control

This asserts two sets are equal iff they have the same elements, that is, a set is determined by its elements.

6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation

The Heat Equation. Lectures INF2320 p. 1/88

Continued Fractions and the Euclidean Algorithm

ON INITIAL SEGMENT COMPLEXITY AND DEGREES OF RANDOMNESS

4.6 The Primitive Recursive Functions

Chapter 4 Lecture Notes

Decision & Risk Analysis Lecture 6. Risk and Utility

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

Notes on Complexity Theory Last updated: August, Lecture 1

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012

Numerical methods for American options

1 The Line vs Point Test

Theory of Computation Chapter 2: Turing Machines

Basics of Statistical Machine Learning

Near Optimal Solutions

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

(Basic definitions and properties; Separation theorems; Characterizations) 1.1 Definition, examples, inner description, algebraic properties

Notes on Continuous Random Variables

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.

Introduction to Scheduling Theory

Metric Spaces Joseph Muscat 2003 (Last revised May 2009)

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.


Local periods and binary partial words: An algorithm

1 Definition of a Turing machine

9 More on differentiation

NONLOCAL PROBLEMS WITH NEUMANN BOUNDARY CONDITIONS

Stationary random graphs on Z with prescribed iid degrees and finite mean connections

1 Calculus of Several Variables

1 The EOQ and Extensions

MTH6120 Further Topics in Mathematical Finance Lesson 2

Lecture 13: Martingales

TOPIC 4: DERIVATIVES

Transcription:

The Ergodic Theorem and randomness Peter Gács Department of Computer Science Boston University March 19, 2008 Peter Gács (Boston University) Ergodic theorem March 19, 2008 1 / 27

Introduction Introduction I will present some results of V. V. V yugin that I find interesting. The motivation is that I wanted to present a topic here that is connected with the theory of randomness but at the same time relates to interests of the audience. I assume that ergodic theory falls into the latter category. Peter Gács (Boston University) Ergodic theorem March 19, 2008 2 / 27

Introduction The train of thought 1 In the theory of randomness, random sequences are defined as those that do not fall into any constructive null sets (see later). 2 Such sequences can be expected to satisfy the almost every -type laws of probability theory. Why? Since the proofs of those laws generally enclose the violating sequences into a constructive null set. 3 Example: the strong law of large numbers. The proofs also yields a speed of convergence. 4 In the proof of the Ergodic Theorem, no speed of convergence is obtained. V yugin s Theorem 1: sometimes, there is no computable speed of convergence here, even in probability. 5 V yugin s Theorem 2: the proof of the Ergodic Theorem can still be constructivized, showing that all random sequences obey the Ergodic Theorem. Peter Gács (Boston University) Ergodic theorem March 19, 2008 3 / 27

Introduction Content The non-effective convergence. Notions of randomness. Constructive content of the Ergodic Theorem. A characterization of randomness via complexity. Peter Gács (Boston University) Ergodic theorem March 19, 2008 4 / 27

Introduction I assume that the audience is not expert in the theory of computability, but the extent to which it is used here should be understandable. Let Σ be an alphabet and Σ the set of strings in this alphabet. We will work with an intuitive notion of algorithms. The following is known. Fix some computer (a Turing machine, but is OK to think of an ordinary computer). It defines a function A(i, x) as follows: we start the computer with program i and input x. If it stops with some output y we set A(i, x) = y, else A(i, x) is undefined. We call A(i, x) a universal computable function since it is known that for every function f : Σ Σ computable with the help of any algorithm there is a computer program i such that f(x) = A(i, x) for all x. Peter Gács (Boston University) Ergodic theorem March 19, 2008 5 / 27

Introduction Consider a measure µ over the set Σ of sequences in the alphabet Σ. It is determined by the values for all segments x Σ. Definition µ(x) := µ(xσ ) Such a measure µ is called computable if there is a computable function µ(x, ɛ) with rational values such that for all rational ɛ we have µ(x) µ(x, ɛ) < ɛ. Peter Gács (Boston University) Ergodic theorem March 19, 2008 6 / 27

Introduction Let X 1, X 2,... be a sequence of random variables and Y a random variable. As usual, we say X n Y in probability if P X n Y > δ 0 for all δ. Definition The convergence above is effective if there is a computable integer-valued function m(δ, ɛ) such that for all rational ɛ, δ > 0 and all n, n > m(δ, ɛ) we have P X n X n > δ < ɛ. It is easy to check that, for example, Chebyshev s inequality provides algorithmically effective convergence in the Law of Large Numbers (where applicable). Peter Gács (Boston University) Ergodic theorem March 19, 2008 7 / 27

The non-effectiveness theorem The non-effectiveness theorem Consider a measure over the set of infinite 0-1 sequences giving rise to the stationary sequence of random variables X 1, X 2,..., with S n = n i=1 X i. By the (weak) Ergodic Theorem there is a random variable Y with the property that S n /n Y in probability. Theorem There is a computable measure of the above form such that the above convergence in probability is not effective. We will do a little better, showing that there is not even a computable function m(ɛ) with P S n /n S n /n > 1 4 < ɛ for all n, n m(ɛ). Peter Gács (Boston University) Ergodic theorem March 19, 2008 8 / 27

The non-effectiveness theorem We will define the stationary measure µ as the mixture µ = 2 i P i, i=1 with the help of the auxiliary processes P i as follows. Let K(i, ɛ) be a universal computable function (for rational argument ɛ). Let t(i, ɛ) be the number of steps for the algorithm to compute K(i, ɛ) ( if the algorithm does not terminate). We can assume t(i, ɛ) max(k(i, ɛ), 10). Let k(i) = t(i, 2 (i+1) ). Let P i be a simple two-state Markov process with states 0,1, with P i [X 1 = 0] = P i [X 1 = 1] = 1, and with the following transition 2 probabilities: P X s+1 X s = αi = 2 k(i). Peter Gács (Boston University) Ergodic theorem March 19, 2008 9 / 27

The non-effectiveness theorem The process is designed to thwart every possible computable convergence bound m(ɛ). Assume m(ɛ) = K(i, ɛ), then α i = 2 k(i) > 0, hence the Markov chain P i satisfies Further, we have P i Si /n 1 2 < 0.1 1. P i (0 k(i) ) = P i (1 k(i) ) = 1 2 (1 α i) k(i) 1 > 1 2 (1 (k(i) 1)α i) = 1 2 (1 (k(i) 1)2 k(i) ) > 2 5 since k(i) > 10. Hence for all sufficiently large n we have P i Sk(i) /k(i) S n /n > 1 4 Pi Sk(i) /k(i) {0, 1} > 4 5, µ S k(i) /k(i) S n /n > 1 4 2 i P i Sk(i) /k(i) S n /n > 1 4 > 2 (i+1). But k(i) = t(i, 2 (i+1) ) m(2 (i+1) ), so m(2 (i+1) ) fails. Peter Gács (Boston University) Ergodic theorem March 19, 2008 10 / 27

The non-effectiveness theorem Open question This example is not ergodic. Is there an ergodic example? This example is a countable Markov chain. Is there an example that is an ergodic Markov chain? (If not, this would mean that every computable ergodic Markov chain has a computable convergence speed in the law of large numbers.) Peter Gács (Boston University) Ergodic theorem March 19, 2008 11 / 27

Randomness Randomness Let Ω = Σ be the space of infinite sequences. Definition A cylinder is of the form Γ x = xσ for some x Σ. An open set is the union of cylinders. It is constructively open if it is i Γ x i for a computable sequence x 1, x 2,.... A sequence of sets G 1, G 2,... is uniformly constructively open if there is a recursive function x(i, j) such that G i = j Γ x(i,j). Let be the set of Borel sets generated by these open sets. Then (Ω, ) is a measureable space. Let µ be a computable probability measure on (Ω, ), then we have a measure space (Ω,, µ). Peter Gács (Boston University) Ergodic theorem March 19, 2008 12 / 27

Randomness If N is a set of measure 0 then for all ɛ > 0, it can be covered by an open set G ɛ with µ(g ɛ ) < ɛ. Definition The set N is a constructive nullset if for rational ɛ the above sets G ɛ can be chosen to be a sequence of uniformly constructively open sets. There is only a countable number of constructive nullsets. Definition An element ω is random if it is not in any constructive set of measure 0. Peter Gács (Boston University) Ergodic theorem March 19, 2008 13 / 27

Randomness Let us introduce a useful equivalent definition. Definition A function f : Σ is lower semicomputable if the set { ω : f(ω) > r } is a constructive open set, uniformly in the rational number r. A function t : Ω + is a payoff test with respect to measure µ if 1 It is lower semicomputable. 2 We have t(ω)dµ 1. Proposition An element ω is random if and only if for every payoff test t we have t(ω) <. Peter Gács (Boston University) Ergodic theorem March 19, 2008 14 / 27

Randomness We can imagine the function t(ω) as the payoff function of a fair bet against ω. We pay 1 dollar for the game, and ω pays us t(ω). The bet is fair since the expected bet is t(ω)dµ 1. The bet is lower semicomputable, allowing to increase our win as we discover new and new regularities in ω. The element is random if we cannot win an infinite amount against it. Peter Gács (Boston University) Ergodic theorem March 19, 2008 15 / 27

The ergodic theorem The ergodic theorem Definition A function f : Σ Σ is a computable transformation if there is a program i for our machine that, reading sequentially the symbols of any input sequence s 1 s 2..., computes and outputs gradually more-and-more symbols of an output sequence t 1 t 2 = f(s 1 s 2... ). Note that a computable transformation is always continuous in the topology of Σ. Peter Gács (Boston University) Ergodic theorem March 19, 2008 16 / 27

The ergodic theorem Let T be computable measure-preserving transformation over (Ω,, µ) with Ω = Σ, and let f(ω) be an integrable function with S n (ω) = n 1 i=0 f(ti ω). Theorem (Constructive Ergodic Theorem) For all random ω we have for a certain integrable f. S n (ω)/n f(ω) For the proof, we define a payoff test for the set of points ω for which S n (ω)/n does not converge. Note that in this case there are rational α, β with lim inf n S n (ω)/n < α < β < lim sup S n (ω)/n. n Peter Gács (Boston University) Ergodic theorem March 19, 2008 17 / 27

The ergodic theorem So there are infinitely many u, v with S u (ω) uα < 0 < S v (ω) vβ. Let u ω v v ω u S u (ω) uα < S v (ω) vβ, ( 1) ω v 0 < S v (ω) vβ. A sequence s = (u 1, v 1,..., u N, v N ) is n-admissible if 1 u 1 < v 1 u 2 < v 2... u N < v N n. We denote s = N. Let σ n (ω, α, β) = max{ N : u 1 ω v1 ω u2 ω v2 ω ω vn } where (u 1, v 1,..., u N, v N ) is running over all n-admissible sequences. Peter Gács (Boston University) Ergodic theorem March 19, 2008 18 / 27

The ergodic theorem It is easy to check that σ n (ω, α, β) is lower semicomputable, and so is σ(ω, α, β) = sup σ n (ω, α, β). n The tough part of the proof is to show σ(ω, α, β)dµ < C(α, β) for a constant C(α, β). Then we can construct a payoff test t(ω) combined from all the σ(ω, α, β) (with constant weights). If S n (ω)/n does not converge then σ(ω, α, β) = for an appropriate α, β, so t(ω) =. Summarizing: we do not get a speed of convergence, but we get an effective probabilistic bound on the maximum number of oscillations through any interval [α, β]. Peter Gács (Boston University) Ergodic theorem March 19, 2008 19 / 27

The ergodic theorem To estimate σ n (ω, α, β) we introduce a quantity: Definition The oscillation cost of an admissible sequence d = (s 1, t 1,..., s N, t N ) is N S(d, ω) = (Svi v i β) (S ui u i α). j=1 The following lemma brings us close: Lemma (Cost shift) For every n-admissible sequence q there is an n-admissible sequence r with S(q, ω) S(r, Tω) + f(ω) α + (β α)σ n (ω, α, β). Peter Gács (Boston University) Ergodic theorem March 19, 2008 20 / 27

The ergodic theorem Let us apply this lemma. With λ n (ω) = sup{ S(d, ω) : d is n-admissible }, we get: (β α) λ n (ω) λ n (Tω) + f(ω) α + (β α)σ n (ω, α, β), (β α)σ n (ω, α, β) λ n (Tω) λ n (Tω) + f(ω) α +, σ n (ω, α, β)dµ f(ω) α + dµ, σ(ω, α, β)dµ (β α) 1 f(ω) α + dµ, and we will be done. Peter Gács (Boston University) Ergodic theorem March 19, 2008 21 / 27

The ergodic theorem To prove the cost shift lemma, the following lemma is used: Lemma (Combinatorial) For a given ω and any n-admissible sequence q there is an n-admissible sequence d with S(d, ω) S(q, ω) and d = σ n (ω, α, β). Its proof would take some work. Also, define the shift for an admissible sequence d = (s 1, t 1,..., s m, t m ): (s 1 1, t 1 1,..., s m 1, t m 1) if s 1 0, d = ( 1, t 1 1,..., s m 1, t m 1) if s 1 = 1, t 1 > 0, (s 2 1, t 2 1,..., s m 1, t m 1) otherwise. The following can be verified directly: S(d, ω) S(d, Tω) + f(ω) α + (β α)m. Peter Gács (Boston University) Ergodic theorem March 19, 2008 22 / 27

The ergodic theorem Using the combinatorial lemma, for all n-admissible q there is an n-admissible d with with S(d, ω) S(q, ω) and d = σ n (ω, α, β). Applying it: S(q, ω) S(d, ω) S(d, Tω) + f(ω) α + (β α)σ n (ω, α, β), which proves the cost shift lemma. Peter Gács (Boston University) Ergodic theorem March 19, 2008 23 / 27

The ergodic theorem History Birkhoff Bishop V yugin. (With some credit to Lambalgen.) Peter Gács (Boston University) Ergodic theorem March 19, 2008 24 / 27

Randomness and complexity Randomness and complexity It turns out that there is one payoff test function encompassing all: Proposition (Universal test) Let us be given a computable measure over (Ω, ) where Ω = Σ. Then there is a universal payoff test u(ω) for µ in the sense that for every other payoff test t(ω) there is a c t such that for all ω we have c t u(ω) t(ω). From now on, we fix a universal payoff test u(ω). In order to understand what it measures, we define the notion of description complexity. Peter Gács (Boston University) Ergodic theorem March 19, 2008 25 / 27

Randomness and complexity Definition A (partial) computable function A : {0, 1} Σ is called an interpreter if its domain of definition is prefix-free: if A(p) and A(q) are both defined then p is not the prefix of q. If A( ) is an interpreter then we define the description complexity: K A (x) = min{ p : A(p) = x }. A theorem similar to the existence of a universal test is the existence of an optimal interpreter: Proposition (Invariance) There is an optimal interpreter U in the sense that for every other interpreter A there is a constant c A such that for all x we have K U (x) K A (x) + c A. From now on we fix a universal interpreter U and write K(x) = K U (x). Peter Gács (Boston University) Ergodic theorem March 19, 2008 26 / 27

Randomness and complexity The following theorem characterizes a universal payoff test in terms of complexity: Proposition (Test characterization) Let µ be a computable measure over the measureable space (Ω, ) with Ω = Σ. Let ω n denote the prefix of length n of the infinite sequence ω. Then we have log u(ω) = sup( log µ(ω n ) K(ω n )) + O(1). n Thus, ω is random if and only if the complexity of each of its prefixes x is never much smaller than log µ(x). In a sense, such a nice characterization justifies the appropriateness of both the randomness notion and the complexity notion. Peter Gács (Boston University) Ergodic theorem March 19, 2008 27 / 27