Basics of information theory and information complexity
|
|
- Janis Garrison
- 7 years ago
- Views:
Transcription
1 Basics of information theory and information complexity a tutorial Mark Braverman Princeton University June 1,
2 Part I: Information theory Information theory, in its modern format was introduced in the 1940s to study the problem of transmitting data over physical channels. Alice communication channel Bob 2
3 Quantifying information Information is measured in bits. The basic notion is Shannon s entropy. The entropy of a random variable is the (typical) number of bits needed to remove the uncertainty of the variable. For a discrete variable: H X Pr X = x log 1/Pr [X = x] 3
4 Shannon s entropy Important examples and properties: If X = x is a constant, then H X = 0. If X is uniform on a finite set S of possible values, then H X = log S. If X is supported on at most n values, then H X log n. If Y is a random variable determined by X, then H Y H(X). 4
5 Conditional entropy For two (potentially correlated) variables X, Y, the conditional entropy of X given Y is the amount of uncertainty left in X given Y: H X Y E y~y H X Y = y. One can show H XY = H Y + H(X Y). This important fact is knows as the chain rule. If X Y, then H XY = H X + H Y X = H X + H Y. 5
6 Example X = B 1, B 2, B 3 Y = B 1 B 2, B 2 B 4, B 3 B 4, B 5 Where B 1, B 2, B 3, B 4, B 5 U {0,1}. Then H X = 3; H Y = 4; H XY = 5; H X Y = 1 = H XY H Y ; H Y X = 2 = H XY H X. 6
7 X = B 1, B 2, B 3 Mutual information Y = B 1 B 2, B 2 B 4, B 3 B 4, B 5 H(X) H(X Y) I(X; Y) B B 1 1 B 2 B 2 B 3 H(Y X) B 4 B 5 H(Y) 7
8 Mutual information The mutual information is defined as I X; Y = H X H X Y = H Y H(Y X) By how much does knowing X reduce the entropy of Y? Always non-negative I X; Y 0. Conditional mutual information: I X; Y Z H X Z H(X YZ) Chain rule for mutual information: I XY; Z = I X; Z + I(Y; Z X) Simple intuitive interpretation. 8
9 Example a biased coin A coin with ε-heads or Tails bias is tossed several times. Let B {H, T} be the bias, and suppose that a-priori both options are equally likely: H B = 1. How many tosses needed to find B? Let T 1,, T k be a sequence of tosses. Start with k = 2. 9
10 What do we learn about B? I B; T 1 T 2 = I B; T 1 + I B; T 2 T 1 = I B; T 1 + I B T 1 ; T 2 I T 1 ; T 2 I B; T 1 + I B T 1 ; T 2 = I B; T 1 + I B; T 2 + I T 1 ; T 2 B = I B; T 1 + I B; T 2 = 2 I B; T 1. Similarly, I B; T 1 T k k I B; T 1. To determine B with constant accuracy, need 0 < c < I B; T 1 T k k I B; T 1. k = Ω(1/I(B; T 1 )). 10
11 Kullback Leibler (KL)-Divergence A distance metric between distributions on the same space. Plays a key role in information theory. D(P Q) P x log P[x] Q[x]. x D(P Q) 0, with equality when P = Q. Caution: D(P Q) D(Q P)! 11
12 Properties of KL-divergence Connection to mutual information: I X; Y = E y Y D(X Y=y X). If X Y, then X Y=y = X, and both sides are 0. Pinsker s inequality: P Q 1 = O( D(P Q)). Tight! D(B 1/2+ε B 1/2 ) = Θ(ε 2 ). 12
13 Back to the coin example I B; T 1 = E b B D(T 1,B=b T 1 ) = D B1 B 2 ±ε 1 = Θ ε k = Ω = Ω( 1 I B;T 1 ε 2). Follow the information learned from the coin tosses Can be done using combinatorics, but the information-theoretic language is more natural for expressing what s going on.
14 Back to communication The reason Information Theory is so important for communication is because information-theoretic quantities readily operationalize. Can attach operational meaning to Shannon s entropy: H X the cost of transmitting X. Let C X be the (expected) cost of transmitting a sample of X. 14
15 H X = C(X)? Not quite. 1 0 Let trit T U 1,2, C T = H T = log It is always the case that C X H(X). 15
16 But H X and C(X) are close Huffman s coding: C X H X + 1. This is a compression result: an uninformative message turned into a short one. Therefore: H X C X H X
17 Shannon s noiseless coding The cost of communicating many copies of X scales as H(X). Shannon s source coding theorem: Let C X n be the cost of transmitting n independent copies of X. Then the amortized transmission cost lim n C(Xn )/n = H X. This equation gives H(X) operational meaning. 17
18 H X operationalized X 1,, X n, H(X) per copy to transmit X s communication channel 18
19 H(X) is nicer than C(X) H X is additive for independent variables. Let T 1, T 2 U {1,2,3} be independent trits. H T 1 T 2 = log 9 = 2 log 3. C T 1 T 2 = 29 9 < C T 1 + C T 2 = = Works well with concepts such as channel capacity. 19
20 Proof of Shannon s noiseless coding n H X = H X n C X n H X n + 1. Additivity of entropy Compression (Huffman) Therefore lim n C(X n )/n = H X. 20
21 Operationalizing other quantities Conditional entropy H X Y : (cf. Slepian-Wolf Theorem). X 1,, X n, H(X Y) per copy to transmit X s communication channel Y 1,, Y n,
22 Operationalizing other quantities Mutual information I X; Y : Y 1,, Y n, X 1,, X n, I X; Y per copy to sample Y s communication channel
23 Information theory and entropy Allows us to formalize intuitive notions. Operationalized in the context of one-way transmission and related problems. Has nice properties (additivity, chain rule ) Next, we discuss extensions to more interesting communication scenarios. 23
24 Communication complexity Focus on the two party randomized setting. Shared randomness R X A & B implement a functionality F(X, Y). F(X,Y) Y A e.g. F X, Y = X = Y? B 24
25 Communication complexity Goal: implement a functionality F(X, Y). A protocol π(x, Y) computing F(X, Y): Shared randomness R X m 1 (X,R) m 2 (Y,m 1,R) m 3 (X,m 1,m 2,R) Y A B F(X,Y) Communication cost = #of bits exchanged.
26 Communication complexity Numerous applications/potential applications (some will be discussed later today). Considerably more difficult to obtain lower bounds than transmission (still much easier than other models of computation!). 26
27 Communication complexity (Distributional) communication complexity with input distribution μ and error ε: CC F, μ, ε. Error ε w.r.t. μ. (Randomized/worst-case) communication complexity: CC(F, ε). Error ε on all inputs. Yao s minimax: CC F, ε = max μ CC(F, μ, ε). 27
28 Examples X, Y 0,1 n. Equality EQ X, Y 1 X=Y. CC EQ, ε log 1. ε CC EQ, 0 n. 28
29 Equality F is X = Y?. μ is a distribution where w.p. ½ X = Y and w.p. ½ (X, Y) are random. X Y A MD5(X) [128 bits] X=Y? [1 bit] Error? B Shows that CC EQ, μ,
30 Examples X, Y 0,1 n. Inner product IP X, Y i X i Y i CC IP, 0 = n o(n). In fact, using information complexity: CC IP, ε = n o ε (n). (mod 2). 30
31 Information complexity Information complexity IC(F, ε):: communication complexity CC F, ε as Shannon s entropy H(X) :: transmission cost C(X) 31
32 Information complexity The smallest amount of information Alice and Bob need to exchange to solve F. How is information measured? Communication cost of a protocol? Number of bits exchanged. Information cost of a protocol? Amount of information revealed. 32
33 Basic definition 1: The information cost of a protocol Prior distribution: X, Y μ. X Y A Protocol Protocol π transcript Π IC(π, μ) = I(Π; Y X) + I(Π; X Y) what Alice learns about Y + what Bob learns about X B
34 Example F is X = Y?. μ is a distribution where w.p. ½ X = Y and w.p. ½ (X, Y) are random. X Y MD5(X) [128 bits] X=Y? [1 bit] A B IC(π, μ) = I(Π; Y X) + I(Π; X Y) = 65.5 bits what Alice learns about Y + what Bob learns about X
35 Prior μ matters a lot for information cost! If μ = 1 x,y a singleton, IC π, μ = 0. 35
36 Example F is X = Y?. μ is a distribution where (X, Y) are just uniformly random. X Y MD5(X) [128 bits] X=Y? [1 bit] A B IC(π, μ) = I(Π; Y X) + I(Π; X Y) = 128 bits what Alice learns about Y + what Bob learns about X
37 Basic definition 2: Information complexity Communication complexity: CC F, μ, ε Analogously: IC F, μ, ε min π computes F with error ε inf π computes F with error ε π. Needed! IC(π, μ). 37
38 Prior-free information complexity Using minimax can get rid of the prior. For communication, we had: CC F, ε = max μ For information IC F, ε inf π computes F with error ε CC(F, μ, ε). max μ IC(π, μ). 38
39 Ex: The information complexity of Equality What is IC(EQ, 0)? Consider the following protocol. A non-singular in Z n n X in {0,1} n A A 1 X Continue for A n steps, or until a disagreement 1 Y is discovered. A 2 X A 2 Y Y in {0,1} n B 39
40 Analysis (sketch) If X Y, the protocol will terminate in O(1) rounds on average, and thus reveal O(1) information. If X=Y the players only learn the fact that X=Y ( 1 bit of information). Thus the protocol has O(1) information complexity for any prior μ. 40
41 Operationalizing IC: Information equals amortized communication Recall [Shannon]: lim n C(X n )/n = H X. Turns out: lim n CC(F n, μ n, ε)/n = IC F, μ, ε, for ε > 0. [Error ε allowed on each copy] For ε = 0: lim n CC(F n, μ n, 0 + )/n = IC F, μ, 0. [ lim n CC(F n, μ n, 0)/n an interesting open problem.] 41
42 Information = amortized communication lim n CC(F n, μ n, ε)/n = IC F, μ, ε. Two directions: and. n H X = H X n C X n H X n + 1. Additivity of entropy Compression (Huffman)
43 The direction lim n CC(F n, μ n, ε)/n IC F, μ, ε. Start with a protocol π solving F, whose IC π, μ is close to IC F, μ, ε. Show how to compress many copies of π into a protocol whose communication cost is close to its information cost. More on compression later. 43
44 The direction lim n CC(F n, μ n, ε)/n IC F, μ, ε. Use the fact that CC Fn,μ n,ε n Additivity of information complexity: IC F n, μ n, ε n IC Fn,μ n,ε n = IC F, μ, ε.. 44
45 Proof: Additivity of information complexity Let T 1 (X 1, Y 1 ) and T 2 (X 2, Y 2 ) be two twoparty tasks. E.g. Solve F(X, Y) with error ε w.r.t. μ Then IC T 1 T 2, μ 1 μ 2 = IC T 1, μ 1 + IC T 2, μ 2 is easy. is the interesting direction.
46 IC T 1, μ 1 + IC T 2, μ 2 IC T 1 T 2, μ 1 μ 2 Start from a protocol π for T 1 T 2 with prior μ 1 μ 2, whose information cost is I. Show how to construct two protocols π 1 for T 1 with prior μ 1 and π 2 for T 2 with prior μ 2, with information costs I 1 and I 2, respectively, such that I 1 + I 2 = I. 46
47 π( X 1, X 2, Y 1, Y 2 ) π 1 X 1, Y 1 Publicly sample X 2 μ 2 Bob privately samples Y 2 μ 2 X2 Run π( X 1, X 2, Y 1, Y 2 ) π 2 X 2, Y 2 Publicly sample Y 1 μ 1 Alice privately samples X 1 μ 1 Y1 Run π( X 1, X 2, Y 1, Y 2 )
48 Analysis - π 1 π 1 X 1, Y 1 Publicly sample X 2 μ 2 Bob privately samples Y 2 μ 2 X2 Run π( X 1, X 2, Y 1, Y 2 ) Alice learns about Y 1 : I Π; Y 1 X 1 X 2 ) Bob learns about X 1 : I Π; X 1 Y 1 Y 2 X 2 ). I 1 = I Π; Y 1 X 1 X 2 ) + I Π; X 1 Y 1 Y 2 X 2 ). 48
49 Analysis - π 2 π 2 X 2, Y 2 Publicly sample Y 1 μ 1 Alice privately samples X 1 μ 1 Y1 Run π( X 1, X 2, Y 1, Y 2 ) Alice learns about Y 2 : I Π; Y 2 X 1 X 2 Y 1 ) Bob learns about X 2 : I Π; X 2 Y 1 Y 2 ). I 2 = I Π; Y 2 X 1 X 2 Y 1 ) + I Π; X 2 Y 1 Y 2 ). 49
50 Adding I 1 and I 2 I 1 + I 2 = I Π; Y 1 X 1 X 2 ) + I Π; X 1 Y 1 Y 2 X 2 ) + I Π; Y 2 X 1 X 2 Y 1 ) + I Π; X 2 Y 1 Y 2 ) = I Π; Y 1 X 1 X 2 ) + I Π; Y 2 X 1 X 2 Y 1 ) + I Π; X 2 Y 1 Y 2 ) + I Π; X 1 Y 1 Y 2 X 2 ) = I Π; Y 1 Y 2 X 1 X 2 + I Π; X 2 X 1 Y 1 Y 2 = I. 50
51 Summary Information complexity is additive. Operationalized via Information = amortized communication. lim n CC(F n, μ n, ε)/n = IC F, μ, ε. Seems to be the right analogue of entropy for interactive computation. 51
52 Entropy vs. Information Complexity Entropy IC Additive? Yes Yes Operationalized lim C(X n )/n n CC F n, μ n, ε lim n n Compression? Huffman: C X H X + 1???!
53 Can interactive communication be compressed? Is it true that CC F, μ, ε IC F, μ, ε + O(1)? Less ambitiously: CC F, μ, O(ε) = O IC F, μ, ε? (Almost) equivalently: Given a protocol π with IC π, μ = I, can Alice and Bob simulate π using O I communication? Not known in general 53
54 Direct sum theorems Let F be any functionality. Let C(F) be the cost of implementing F. Let F n be the functionality of implementing n independent copies of F. The direct sum problem: Does C(F n ) n C(F)? In most cases it is obvious that C(F n ) n C(F). 54
55 Direct sum randomized communication complexity Is it true that CC(F n, μ n, ε) = Ω(n CC(F, μ, ε))? Is it true that CC(F n, ε) = Ω(n CC(F, ε))? 55
56 Direct product randomized communication complexity Direct sum CC(F n, μ n, ε) = Ω(n CC(F, μ, ε))? Direct product CC(F n, μ n, (1 ε) n ) = Ω(n CC(F, μ, ε))? 56
57 Direct sum for randomized CC and interactive compression Direct sum: CC(F n, μ n, ε) = Ω(n CC(F, μ, ε))? In the limit: n IC(F, μ, ε) = Ω(n CC(F, μ, ε))? Interactive compression: CC F, μ, ε = O(IC F, μ, ε)? Same question! 57
58 The big picture IC(F, μ, ε) additivity (=direct sum) for information IC(F n, μ n, ε)/n CC(F, μ, ε) interactive compression? direct sum for communication? information = amortized communication CC(F n, μ n, ε)/n
59 Current results for compression A protocol π that has C bits of communication, conveys I bits of information over prior μ, and works in r rounds can be simulated: Using O(I + r) bits of communication. Using O I C bits of communication. Using 2 O(I) bits of communication. If μ = μ X μ Y, then using O(I polylog C) bits of communication. 59
60 Their direct sum counterparts CC(F n, μ n, ε) = Ω(n 1/2 CC(F, μ, ε)). CC(F n, ε) = Ω(n 1/2 CC(F, ε)). For product distributions μ = μ X μ Y, CC(F n, μ n, ε) = Ω(n CC(F, μ, ε)). When the number of rounds is bounded by r n, a direct sum theorem holds. 60
61 Direct product The best one can hope for is a statement of the type: CC(F n, μ n, 1 2 O n ) = Ω(n IC(F, μ, 1/3)). Can prove: CC F n, μ n, 1 2 O n = Ω(n 1/2 CC(F, μ, 1/3)). 61
62 Proof 2: Compressing a one-round protocol Say Alice speaks: IC π, μ = I M; X Y. Recall KL-divergence: I M; X Y = E Y D M XY M Y Bottom line: Alice has M X ; Bob has M Y ; = E Y D M X M Y Goal: sample from M X using D(M X M Y ) communication. 62
63 The dart board Interpret the public randomness as random points in U [0,1], where U is the universe of all possible messages. First message under the histogram of M is distributed M. 1 0 qu 1 qu 2 qu 3 qu 4 qu 5 qu 6 qu 7. u 1 u 2 U u 3 u 4 u 5
64 Proof Idea Sample using O(log 1/ε + D(M X M Y )) communication with statistical error ε. Public randomness: ~ U samples 1 qu 1 qu 2 qu 3 qu 4 qu 5 qu 6 qu 7. 1 M X M Y u 3 u 3 u 1 u 1 u 4 u 2 u 4 u M X u 4 M Y 64
65 Proof Idea Sample using O(log 1/ε + D(M X M Y )) communication with statistical error ε. 1 1 M X M Y M X u 4 u 4 u h 1 (u 4 ) h 2 (u 4 ) u 2 M Y 65
66 Proof Idea Sample using O(log 1/ε + D(M X M Y )) communication with statistical error ε. 1 1 M X 2M Y u 4 u M X u 4 u 4 h 3 (u 4 ) h 4 (u 4 ) h log 1/ ε (u 4 ) u 4 M Y h 1 (u 4 ), h 2 (u 4 ) 66
67 Analysis If M X (u 4 ) 2 k M Y (u 4 ), then the protocol will reach round k of doubling. There will be 2 k candidates. About k + log 1/ε hashes to narrow to one. The contribution of u 4 to cost: M X (u 4 ) (log M X ( u 4 )/M Y ( u 4 ) + log 1/ε). D(M X M Y ) u M X (u) log M X(u) M Y (u). Done! 67
68 External information cost (X, Y) ~ μ. X C Y A Protocol Protocol π transcript π IC ext (π, μ) = I(Π; XY) what Charlie learns about (X,Y) B
69 Example F is X=Y?. μ is a distribution where w.p. ½ X=Y and w.p. ½ (X,Y) are random. X Y MD5(X) X=Y? A IC ext (π, μ) = I(Π; XY) = 129 bits what Charlie learns about (X,Y) B 69
70 External information cost It is always the case that IC ext π, μ IC π, μ. If μ = μ X μ Y is a product distribution, then IC ext π, μ = IC π, μ. 70
71 External information complexity IC ext F, μ, ε inf π computes F with error ε Can it be operationalized? IC ext (π, μ). 71
72 Operational meaning of IC ext? Conjecture: Zero-error communication scales like external information: CC F n, μ n, 0 lim = IC n n ext F, μ, 0? Recall: CC F n, μ n, 0 + lim = IC F, μ, 0. n n 72
73 Example transmission with a strong prior X, Y 0,1 μ is such that X U 0,1, and X = Y with a very high probability (say 1 1/ n). F X, Y = X is just the transmit X function. Clearly, π should just have Alice send X to Bob. IC F, μ, 0 = IC π, μ = H 1 = o(1). n IC ext F, μ, 0 = IC ext π, μ = 1. 73
74 Example transmission with a strong prior IC F, μ, 0 = IC π, μ = H 1 n = o(1). IC ext F, μ, 0 = IC ext π, μ = 1. CC F n, μ n, 0 + = o n. CC F n, μ n, 0 = Ω(n). Other examples, e.g. the two-bit AND function fit into this picture. 74
75 Additional directions Information complexity Interactive coding Information theory in TCS 75
76 Interactive coding theory So far focused the discussion on noiseless coding. What if the channel has noise? [What kind of noise?] In the non-interactive case, each channel has a capacity C. 76
77 Channel capacity The amortized number of channel uses needed to send X over a noisy channel of capacity C is H X C Decouples the task from the channel! 77
78 Example: Binary Symmetric Channel Each bit gets independently flipped with probability ε < 1/2. One way capacity 1 H ε. 0 ε 1 ε ε ε 1 78
79 Interactive channel capacity Not clear one can decouple channel from task in such a clean way. Capacity much harder to calculate/reason about. 1 ε Example: Binary symmetric channel. 0 ε One way capacity 1 H ε. 1 Interactive (for simple pointer jumping, 1 ε [Kol-Raz 13]): 1 Θ H ε. 0 1
80 Information theory in communication complexity and beyond A natural extension would be to multi-party communication complexity. Some success in the number-in-hand case. What about the number-on-forehead? Explicit bounds for log n players would imply explicit ACC 0 circuit lower bounds. 80
81 Naïve multi-party information cost YZ XZ XY A B C IC π, μ = I(Π; X YZ)+ I(Π; Y XZ)+ I(Π; Z XY) 81
82 Naïve multi-party information cost IC π, μ = I(Π; X YZ)+ I(Π; Y XZ)+ I(Π; Z XY) Doesn t seem to work. Secure multi-party computation [Ben- Or,Goldwasser, Wigderson], means that anything can be computed at near-zero information cost. Although, these construction require the players to share private channels/randomness. 82
83 Communication and beyond The rest of today: Data structures; Streaming; Distributed computing; Privacy. Exact communication complexity bounds. Extended formulations lower bounds. Parallel repetition? 83
84 Thank You! 84
85 Open problem: Computability of IC Given the truth table of F X, Y, μ and ε, compute IC F, μ, ε. Via IC F, μ, ε = lim n CC(F n, μ n, ε)/n can compute a sequence of upper bounds. But the rate of convergence as a function of n is unknown. 85
86 Open problem: Computability of IC Can compute the r-round IC r F, μ, ε information complexity of F. But the rate of convergence as a function of r is unknown. Conjecture: IC r F, μ, ε IC F, μ, ε = O F,μ,ε 1 r 2. This is the relationship for the two-bit AND. 86
Ex. 2.1 (Davide Basilio Bartolini)
ECE 54: Elements of Information Theory, Fall 00 Homework Solutions Ex.. (Davide Basilio Bartolini) Text Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips
More informationThe sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].
Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real
More informationLecture 13 - Basic Number Theory.
Lecture 13 - Basic Number Theory. Boaz Barak March 22, 2010 Divisibility and primes Unless mentioned otherwise throughout this lecture all numbers are non-negative integers. We say that A divides B, denoted
More informationAn Introduction to Information Theory
An Introduction to Information Theory Carlton Downey November 12, 2013 INTRODUCTION Today s recitation will be an introduction to Information Theory Information theory studies the quantification of Information
More informationx a x 2 (1 + x 2 ) n.
Limits and continuity Suppose that we have a function f : R R. Let a R. We say that f(x) tends to the limit l as x tends to a; lim f(x) = l ; x a if, given any real number ɛ > 0, there exists a real number
More informationNotes on metric spaces
Notes on metric spaces 1 Introduction The purpose of these notes is to quickly review some of the basic concepts from Real Analysis, Metric Spaces and some related results that will be used in this course.
More informationGambling and Data Compression
Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction
More informationData Streams A Tutorial
Data Streams A Tutorial Nicole Schweikardt Goethe-Universität Frankfurt am Main DEIS 10: GI-Dagstuhl Seminar on Data Exchange, Integration, and Streams Schloss Dagstuhl, November 8, 2010 Data Streams Situation:
More informationLecture 1: Course overview, circuits, and formulas
Lecture 1: Course overview, circuits, and formulas Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: John Kim, Ben Lund 1 Course Information Swastik
More informationGambling with Information Theory
Gambling with Information Theory Govert Verkes University of Amsterdam January 27, 2016 1 / 22 How do you bet? Private noisy channel transmitting results while you can still bet, correct transmission(p)
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete
More informationLecture 4: AC 0 lower bounds and pseudorandomness
Lecture 4: AC 0 lower bounds and pseudorandomness Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: Jason Perry and Brian Garnett In this lecture,
More informationCHAPTER 6. Shannon entropy
CHAPTER 6 Shannon entropy This chapter is a digression in information theory. This is a fascinating subject, which arose once the notion of information got precise and quantifyable. From a physical point
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationThe positive minimum degree game on sparse graphs
The positive minimum degree game on sparse graphs József Balogh Department of Mathematical Sciences University of Illinois, USA jobal@math.uiuc.edu András Pluhár Department of Computer Science University
More informationMetric Spaces. Chapter 7. 7.1. Metrics
Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some
More informationI. GROUPS: BASIC DEFINITIONS AND EXAMPLES
I GROUPS: BASIC DEFINITIONS AND EXAMPLES Definition 1: An operation on a set G is a function : G G G Definition 2: A group is a set G which is equipped with an operation and a special element e G, called
More information1. Prove that the empty set is a subset of every set.
1. Prove that the empty set is a subset of every set. Basic Topology Written by Men-Gen Tsai email: b89902089@ntu.edu.tw Proof: For any element x of the empty set, x is also an element of every set since
More information0.1 Phase Estimation Technique
Phase Estimation In this lecture we will describe Kitaev s phase estimation algorithm, and use it to obtain an alternate derivation of a quantum factoring algorithm We will also use this technique to design
More informationPolarization codes and the rate of polarization
Polarization codes and the rate of polarization Erdal Arıkan, Emre Telatar Bilkent U., EPFL Sept 10, 2008 Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,...,
More informationMATH 425, PRACTICE FINAL EXAM SOLUTIONS.
MATH 45, PRACTICE FINAL EXAM SOLUTIONS. Exercise. a Is the operator L defined on smooth functions of x, y by L u := u xx + cosu linear? b Does the answer change if we replace the operator L by the operator
More information4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
More information4.5 Chebyshev Polynomials
230 CHAP. 4 INTERPOLATION AND POLYNOMIAL APPROXIMATION 4.5 Chebyshev Polynomials We now turn our attention to polynomial interpolation for f (x) over [ 1, 1] based on the nodes 1 x 0 < x 1 < < x N 1. Both
More informationLECTURE 4. Last time: Lecture outline
LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
More informationPractice with Proofs
Practice with Proofs October 6, 2014 Recall the following Definition 0.1. A function f is increasing if for every x, y in the domain of f, x < y = f(x) < f(y) 1. Prove that h(x) = x 3 is increasing, using
More informationThe Heat Equation. Lectures INF2320 p. 1/88
The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)
More informationA Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails
12th International Congress on Insurance: Mathematics and Economics July 16-18, 2008 A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails XUEMIAO HAO (Based on a joint
More information1 Sufficient statistics
1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =
More informationSolutions for Practice problems on proofs
Solutions for Practice problems on proofs Definition: (even) An integer n Z is even if and only if n = 2m for some number m Z. Definition: (odd) An integer n Z is odd if and only if n = 2m + 1 for some
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationChapter 4 Lecture Notes
Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,
More informationRandom variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.
Random variables Remark on Notations 1. When X is a number chosen uniformly from a data set, What I call P(X = k) is called Freq[k, X] in the courseware. 2. When X is a random variable, what I call F ()
More informationSection 6.1 Joint Distribution Functions
Section 6.1 Joint Distribution Functions We often care about more than one random variable at a time. DEFINITION: For any two random variables X and Y the joint cumulative probability distribution function
More informationTowards a Tight Finite Key Analysis for BB84
The Uncertainty Relation for Smooth Entropies joint work with Charles Ci Wen Lim, Nicolas Gisin and Renato Renner Institute for Theoretical Physics, ETH Zurich Group of Applied Physics, University of Geneva
More informationLightweight and Secure PUF Key Storage Using Limits of Machine Learning
Lightweight and Secure PUF Key Storage Using Limits of Machine Learning Meng-Day (Mandel) Yu 1, David M Raïhi 1, Richard Sowell 1, Srinivas Devadas 2 1 Verayo, Inc., San Jose, CA, USA 2 MIT, Cambridge,
More informationInfluences in low-degree polynomials
Influences in low-degree polynomials Artūrs Bačkurs December 12, 2012 1 Introduction In 3] it is conjectured that every bounded real polynomial has a highly influential variable The conjecture is known
More informationMATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α
More informationPolynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range
THEORY OF COMPUTING, Volume 1 (2005), pp. 37 46 http://theoryofcomputing.org Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range Andris Ambainis
More informationI. Pointwise convergence
MATH 40 - NOTES Sequences of functions Pointwise and Uniform Convergence Fall 2005 Previously, we have studied sequences of real numbers. Now we discuss the topic of sequences of real valued functions.
More informationOnline Appendix to Stochastic Imitative Game Dynamics with Committed Agents
Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents William H. Sandholm January 6, 22 O.. Imitative protocols, mean dynamics, and equilibrium selection In this section, we consider
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationReading.. IMAGE COMPRESSION- I IMAGE COMPRESSION. Image compression. Data Redundancy. Lossy vs Lossless Compression. Chapter 8.
Reading.. IMAGE COMPRESSION- I Week VIII Feb 25 Chapter 8 Sections 8.1, 8.2 8.3 (selected topics) 8.4 (Huffman, run-length, loss-less predictive) 8.5 (lossy predictive, transform coding basics) 8.6 Image
More informationWORKED EXAMPLES 1 TOTAL PROBABILITY AND BAYES THEOREM
WORKED EXAMPLES 1 TOTAL PROBABILITY AND BAYES THEOREM EXAMPLE 1. A biased coin (with probability of obtaining a Head equal to p > 0) is tossed repeatedly and independently until the first head is observed.
More informationSMT 2014 Algebra Test Solutions February 15, 2014
1. Alice and Bob are painting a house. If Alice and Bob do not take any breaks, they will finish painting the house in 20 hours. If, however, Bob stops painting once the house is half-finished, then the
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 Introduction to Discrete Probability Probability theory has its origins in gambling analyzing card games, dice,
More informationQuestion: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?
ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the
More informationWeek 1: Introduction to Online Learning
Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider
More informationSection 1.3 P 1 = 1 2. = 1 4 2 8. P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., = 1 2 4.
Difference Equations to Differential Equations Section. The Sum of a Sequence This section considers the problem of adding together the terms of a sequence. Of course, this is a problem only if more than
More information1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let
Copyright c 2009 by Karl Sigman 1 Stopping Times 1.1 Stopping Times: Definition Given a stochastic process X = {X n : n 0}, a random time τ is a discrete random variable on the same probability space as
More informationMULTIVARIATE PROBABILITY DISTRIBUTIONS
MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability
CS 7 Discrete Mathematics and Probability Theory Fall 29 Satish Rao, David Tse Note 8 A Brief Introduction to Continuous Probability Up to now we have focused exclusively on discrete probability spaces
More informationNational Sun Yat-Sen University CSE Course: Information Theory. Gambling And Entropy
Gambling And Entropy 1 Outline There is a strong relationship between the growth rate of investment in a horse race and the entropy of the horse race. The value of side information is related to the mutual
More informationPrime numbers and prime polynomials. Paul Pollack Dartmouth College
Prime numbers and prime polynomials Paul Pollack Dartmouth College May 1, 2008 Analogies everywhere! Analogies in elementary number theory (continued fractions, quadratic reciprocity, Fermat s last theorem)
More informationNotes on Continuous Random Variables
Notes on Continuous Random Variables Continuous random variables are random quantities that are measured on a continuous scale. They can usually take on any value over some interval, which distinguishes
More informationInference of Probability Distributions for Trust and Security applications
Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations
More informationMath 4310 Handout - Quotient Vector Spaces
Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2 Proofs Intuitively, the concept of proof should already be familiar We all like to assert things, and few of us
More informationPrivacy and Security in the Internet of Things: Theory and Practice. Bob Baxley; bob@bastille.io HitB; 28 May 2015
Privacy and Security in the Internet of Things: Theory and Practice Bob Baxley; bob@bastille.io HitB; 28 May 2015 Internet of Things (IoT) THE PROBLEM By 2020 50 BILLION DEVICES NO SECURITY! OSI Stack
More informationBayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012
Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic
More informationMetric Spaces. Chapter 1
Chapter 1 Metric Spaces Many of the arguments you have seen in several variable calculus are almost identical to the corresponding arguments in one variable calculus, especially arguments concerning convergence
More information1 Inner Products and Norms on Real Vector Spaces
Math 373: Principles Techniques of Applied Mathematics Spring 29 The 2 Inner Product 1 Inner Products Norms on Real Vector Spaces Recall that an inner product on a real vector space V is a function from
More informationSolutions to Homework 10
Solutions to Homework 1 Section 7., exercise # 1 (b,d): (b) Compute the value of R f dv, where f(x, y) = y/x and R = [1, 3] [, 4]. Solution: Since f is continuous over R, f is integrable over R. Let x
More informationMIMO CHANNEL CAPACITY
MIMO CHANNEL CAPACITY Ochi Laboratory Nguyen Dang Khoa (D1) 1 Contents Introduction Review of information theory Fixed MIMO channel Fading MIMO channel Summary and Conclusions 2 1. Introduction The use
More information1. First-order Ordinary Differential Equations
Advanced Engineering Mathematics 1. First-order ODEs 1 1. First-order Ordinary Differential Equations 1.1 Basic concept and ideas 1.2 Geometrical meaning of direction fields 1.3 Separable differential
More informationFACTORING LARGE NUMBERS, A GREAT WAY TO SPEND A BIRTHDAY
FACTORING LARGE NUMBERS, A GREAT WAY TO SPEND A BIRTHDAY LINDSEY R. BOSKO I would like to acknowledge the assistance of Dr. Michael Singer. His guidance and feedback were instrumental in completing this
More informationST 371 (IV): Discrete Random Variables
ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible
More informationCOMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012
Binary numbers The reason humans represent numbers using decimal (the ten digits from 0,1,... 9) is that we have ten fingers. There is no other reason than that. There is nothing special otherwise about
More information11 Multivariate Polynomials
CS 487: Intro. to Symbolic Computation Winter 2009: M. Giesbrecht Script 11 Page 1 (These lecture notes were prepared and presented by Dan Roche.) 11 Multivariate Polynomials References: MC: Section 16.6
More informationPUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.
PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include
More informationNo: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics
No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationChapter 1 Introduction
Chapter 1 Introduction 1. Shannon s Information Theory 2. Source Coding theorem 3. Channel Coding Theory 4. Information Capacity Theorem 5. Introduction to Error Control Coding Appendix A : Historical
More information1 Formulating The Low Degree Testing Problem
6.895 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 5: Linearity Testing Lecturer: Dana Moshkovitz Scribe: Gregory Minton and Dana Moshkovitz In the last lecture, we proved a weak PCP Theorem,
More informationQuotient Rings and Field Extensions
Chapter 5 Quotient Rings and Field Extensions In this chapter we describe a method for producing field extension of a given field. If F is a field, then a field extension is a field K that contains F.
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationUnified Lecture # 4 Vectors
Fall 2005 Unified Lecture # 4 Vectors These notes were written by J. Peraire as a review of vectors for Dynamics 16.07. They have been adapted for Unified Engineering by R. Radovitzky. References [1] Feynmann,
More informationEntropy and Mutual Information
ENCYCLOPEDIA OF COGNITIVE SCIENCE 2000 Macmillan Reference Ltd Information Theory information, entropy, communication, coding, bit, learning Ghahramani, Zoubin Zoubin Ghahramani University College London
More informationExact Nonparametric Tests for Comparing Means - A Personal Summary
Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola
More informationCollinear Points in Permutations
Collinear Points in Permutations Joshua N. Cooper Courant Institute of Mathematics New York University, New York, NY József Solymosi Department of Mathematics University of British Columbia, Vancouver,
More informationTHE NUMBER OF REPRESENTATIONS OF n OF THE FORM n = x 2 2 y, x > 0, y 0
THE NUMBER OF REPRESENTATIONS OF n OF THE FORM n = x 2 2 y, x > 0, y 0 RICHARD J. MATHAR Abstract. We count solutions to the Ramanujan-Nagell equation 2 y +n = x 2 for fixed positive n. The computational
More information9231 FURTHER MATHEMATICS
CAMBRIDGE INTERNATIONAL EXAMINATIONS GCE Advanced Subsidiary Level and GCE Advanced Level MARK SCHEME for the October/November 2012 series 9231 FURTHER MATHEMATICS 9231/21 Paper 2, maximum raw mark 100
More informationALMOST COMMON PRIORS 1. INTRODUCTION
ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type
More informationPrinciple of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
More informationGames of Incomplete Information
Games of Incomplete Information Jonathan Levin February 00 Introduction We now start to explore models of incomplete information. Informally, a game of incomplete information is a game where the players
More information1 The Line vs Point Test
6.875 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 5: Low Degree Testing Lecturer: Dana Moshkovitz Scribe: Gregory Minton and Dana Moshkovitz Having seen a probabilistic verifier for linearity
More informationReview Horse Race Gambling and Side Information Dependent horse races and the entropy rate. Gambling. Besma Smida. ES250: Lecture 9.
Gambling Besma Smida ES250: Lecture 9 Fall 2008-09 B. Smida (ES250) Gambling Fall 2008-09 1 / 23 Today s outline Review of Huffman Code and Arithmetic Coding Horse Race Gambling and Side Information Dependent
More informationTriangle deletion. Ernie Croot. February 3, 2010
Triangle deletion Ernie Croot February 3, 2010 1 Introduction The purpose of this note is to give an intuitive outline of the triangle deletion theorem of Ruzsa and Szemerédi, which says that if G = (V,
More informationAn Introduction to Competitive Analysis for Online Optimization. Maurice Queyranne University of British Columbia, and IMA Visitor (Fall 2002)
An Introduction to Competitive Analysis for Online Optimization Maurice Queyranne University of British Columbia, and IMA Visitor (Fall 2002) IMA, December 11, 2002 Overview 1 Online Optimization sequences
More informationStrong Parallel Repetition Theorem for Free Projection Games
Strong Parallel Repetition Theorem for Free Projection Games Boaz Barak Anup Rao Ran Raz Ricky Rosen Ronen Shaltiel April 14, 2009 Abstract The parallel repetition theorem states that for any two provers
More information2.1 Complexity Classes
15-859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Complexity classes, Identity checking Date: September 15, 2004 Scribe: Andrew Gilpin 2.1 Complexity Classes In this lecture we will look
More informationFixed Point Theorems
Fixed Point Theorems Definition: Let X be a set and let T : X X be a function that maps X into itself. (Such a function is often called an operator, a transformation, or a transform on X, and the notation
More information1 Review of Least Squares Solutions to Overdetermined Systems
cs4: introduction to numerical analysis /9/0 Lecture 7: Rectangular Systems and Numerical Integration Instructor: Professor Amos Ron Scribes: Mark Cowlishaw, Nathanael Fillmore Review of Least Squares
More informationHOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!
Math 7 Fall 205 HOMEWORK 5 SOLUTIONS Problem. 2008 B2 Let F 0 x = ln x. For n 0 and x > 0, let F n+ x = 0 F ntdt. Evaluate n!f n lim n ln n. By directly computing F n x for small n s, we obtain the following
More informationMaximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
More informationDifferentiation and Integration
This material is a supplement to Appendix G of Stewart. You should read the appendix, except the last section on complex exponentials, before this material. Differentiation and Integration Suppose we have
More informationChapter 3. Cartesian Products and Relations. 3.1 Cartesian Products
Chapter 3 Cartesian Products and Relations The material in this chapter is the first real encounter with abstraction. Relations are very general thing they are a special type of subset. After introducing
More information1 if 1 x 0 1 if 0 x 1
Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or
More informationLecture 4: BK inequality 27th August and 6th September, 2007
CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound
More information