Computer Science and Engineering, UCSD October 7, 1999 Goldreic-Levin Teorem Autor: Bellare Te Goldreic-Levin Teorem 1 Te problem We æx a an integer n for te lengt of te strings involved. If a is an n-bit string and 1 i n ten a èiè denotes te i-t bit of a. If a; b are n-bit strings ten a; bi = a è1è b è1è + a èè b èè + æææa ènè b ènè denotes te inner product of a and b. Te operations ere are modulo two, meaning we work over te ænite æeld of two elements, so te value above is a bit. We are given an oracle B x : f0; 1g n!f0;1gand a real number æé0 suc tat Pr B x èrè =x; ri : r R èf0;1g ni = 1 +æ : We call æ te advantage of B x. We are not directly given x. We are also given anoter oracle EQ x : f0; 1g n! f0; 1g wic given any y f0; 1g n returns 1 if y = x and 0 oterwise. In oter words, we can test weter or not a given string equals x. Te problem is, given tese two oracles, to ænd x. We want to ægure out ow to do it and also wat is te complexity. More precisely, we wis to design an algoritm A tat given te above two oracles returns a string x 0. Te success probability of A is te probability tat x = x 0, taken over te coin tosses of A. We seek A aving success probability atleast 1=. We let q B denote te number of calls made by A to B x and q E te number of calls to EQ x. We let t be te running time plus te size of te code of A in some æxed RAM model of computation. èalternatively, A is a circuit and t is te size of te circuit.è We want to ægure out q B ;q E ;t as functions of n; æ. Certainly we want tem all to be polyèn; 1=æè, but we want to know exactly wat is te polynomial ere. Te algoritm sould work for any x f0;1g n. Wewould like actually someting sligtly more general. We would like toview q B ;q E ;t as given, and lower bound te success probability ofaas a function of n; æ; q B ;q E ;t. But tis problem does not appear to ave been studied. è1è Background Te context of Goldreic and Levin ë5ë is to ænd a ard-core predicate for any one-way function. Given a lengt-preserving one-way function f: f0; 1g æ!f0;1g æ,deæne F èx; rè =èfèxè;rè were jxj = jrj. Tis is also a one-way function. Now te claim is tat x; ri is a ard-core predicate for 1
Bellare tis function. Tis means tat if tere was an eæcient algoritm to predict x; ri given fèxè;r, tere is also an eæcient algoritm to compute a pre-image of f èxè given f èxè. Probabilities ere are taken over te random coice of x and r. Te tecnical part of te reduction amounts to te above problem. Te given algoritm for predicting x; ri is B x, and te oracle tat can verify a coice of x is implicit because we avefèxè and can compute f. Te proof in Section 4 is due to Rackoæ, using ideas of ë1ë. It is a simpliæcation of te original proof of ë5ë. It is along te same lines as te proof in ë7ë. Two oter excellent sources are Goldreic's survey ë3ë and book ë4ë, wic present a proof using te same ideas and also present security improvements. A recent paper of Levin ë6ë migt ave furter security improvements. It would be nice to read ë3, 4, 6ë and ægure out te improvements. 3 Te ig advantage case Te oracle B x partitions te set of n-bit strings into two parts. Te ëgood" strings are tose inputs on wic te oracle is correct and te bad strings are tose inputs on wic te oracle is wrong. It useful to name tese sets: Gd = f s f0;1g n : B x èsè=x; sig Bd = f s f0;1g n : B x èsè6=x; sig: Our assumption can ten equivalently be stated as jgdj = 1+æ æ n and jbdj = 1, æ æ n : Tis will elp us tink about te problem. Recall our problem is to ænd x given oracle access to B x and EQ x. To get some intuition, ærst assume tat B x is always correct, meaning as advantage æ = 1. In oter words Gd = f0; 1g n, meaning we simply ave an oracle wic given any n-bit string r returns x; ri. How can we ænd x? For i = 1;:::;n let e i denote te string aving a one in position i and zeros elsewere. Observe tat x èiè = x; e i i. So it suæces to make te queries e1;:::;e n to B x to compute x. We did not even need te EQ x oracle. Now suppose te advantage of B x is less tan 1, but still very close to 1. Let æ =1,æ. Tis by assumption is small, close to 0. A ærst tougt is to proceed as above; we make queries e1;:::;e n to B x. But te probability of success ere could be zero. Even toug B x is correct on most inputs, tese particular inputs may not be among tem. Meaning, even toug Gd occupies a 1,æ fraction of f0; 1g n, it could still be true tat some or all of te points e1;:::;e n are in Bd. If we want any cance of success, we must only invoke B x on random points, so tat we ave a cance of falling in Gd. Tis leads to te idea of using self-correction ècf. ëëè. Te algoritm of Figure 1 takes as input any n-bit string z and attempts to compute x; zi by invoking B x only on random points, eac individually unrelated to z. Remember tat aritmetic operations are modulo two. To analyze te algoritm, observe tat te linearity of te inner product function tells us tat x; zi = x; z + ri,x; ri for any n-bit string r. If r is random, so is z + r. Te two are not
Goldreic-Levin Teorem 3 Algoritm SC Bx èzè r R èf0;1g n b1èb x èz+rè; bèb x èrè Return b1, b Figure 1: Te SC algoritm tat attempts to compute x; zi given z. independent, but it is still true tat bot, individually, are uniformly distributed, and tat's wat we will use. Te probability below is over te random coice of r made by te algoritm of Figure 1. Pr ë b1, b 6= x; zi ë Pr ë B x èz + rè 6= x; z + ri or B x èrè 6= x; ri ë = Pr ë z + r Bd or r Bd ë Pr ë z + r Bd ë + Pr ë r Bd ë 1 = æ, æ = 1, æ = æ: In oter words, our algoritm is correct except wit probability æ. Tis is quite nice since its input z is not necessarily random. In particular z migt be in Bd. To ænd x we use te same observation as above, namely tat it suæces to ænd te n bits x; e i i for i =1;:::;n. Do tis by calling SC Bx èe i è for i =1;:::;n. ènote tat eac call results in a new random coice of r.è Te probability tat all tese n calls return te rigt answer is at least 1,næ. So as long as æ 1=ènè, te success probability of our procedure is at least 1=. Te requirement æ 1=ènè translates to æ 1, 1=ènè, meaning æ is tending to 1asntends to inænity. We would like to do better and ænd x even wen æ is not only a constant, but peraps even an inverse polynomial in n. Here's a tougt. Above, we were sloppy in upper bounding te failure probability of te algoritm SC Bx èzè. Te way we did it is to say tat we wanted bot b1 and b to be correct; all oter cases we took to be failure. But actually, te output of te algoritm is also correct wen bot b1 and b are wrong, because we are working mod two. In oter words, te bad case is not tat at least one of te two is wrong, but exactly one of te two is wrong, and tis migt ave a smaller probability of appening. Tus Pr ë b1, b 6= x; zi ë = Pr ë z + r Bd and r Gd ë+prëz+rgd and r Bd ë : However r and z + r are not independently distributed, so te value of te terms above is unclear. It turns out tat tere can be a value of z suc tat bot probabilities above equal è1, æè=, in wic case te sum is 1, æ = æ just as before. èyou can try to build tis example as an exerciseè. So tis idea doesn't elp after all. We need a diæerent algoritm.
4 Bellare Algoritm Strong-SC Bx èz; r1;:::;r m ;b1;:::;b m è sum è 0 For i =1;:::; m do bës i ë è P js i b j c i è B x èz + RëS i ëè, bës i ë sum è sum + c i End For If sum m = ten b è 1 else b è 0 Return b Figure : Te Strong-SC algoritm tat attempts to compute x; zi given a random sequence of n-bit strings R =èr 1 ;:::;rmè and auxiliary bits b 1 ;:::;bm. 4 Te general case If k is any integer we let ëkë =f1;:::;kg. We introduce a parameter m wic will eventually be set to c lgènè for some constant P c to be speciæed. If R =èr1;:::;r m è is a sequence of n-bit strings and S ëmë ten we let RëSë = js r j. Te sum ere is performed componentwise modulo two, so te result is an n-bit string. Let S1;:::;Sm be a listing of all subsets of ëmë in some canonical order. Te goal of te Strong-SC Bx algoritm of Figure is te same as tat of SC Bx, namely to compute x; zi for a given input z f0;1g n. However our new algoritm as additional inputs. It takes a sequence R =èr1;:::;r m è of n-bit strings wic will be selected at random. It also takes a sequence b1;:::;b m of bits. For te moment assume tat b j = x; r j i for j =1;:::;m. How we can ænd tese bits is a question we will address later; for now, just assume we managed to guess te ërigt" values of te m inner products x; r1i;:::;x; r m i. In te algoritm, sum is an integer counter and te ë+" in ësum + c i "isinteger addition; all oter operations are te usual mod two ones. Te idea beind te algoritm is te following. Te linearity of te inner-product function tells us tat for any i =1;:::; m weave x; z + RëS i ëi = x; zi + X js i x; r j i : If b j = x; r j i ten te rigt-and side is x; zi + P js i b j. Denoting te sum ere by bës i ëwe can solve as follows: x; zi = x; z + RëS i ëi,bës i ë: We want to use tis equation to determine x; zi. We will attempt to compute x; z + RëS i ëi by calling B x on input z + RëS i ë. We will argue tat wit ig enoug probability over te coice of te sequence R we ave x; zi = B x èz + RëS i ëè, bës i ë
Goldreic-Levin Teorem 5 Algoritm Recover Bx;EQ x è1 n è For j =1;:::;m do r R j èf0;1g n End For For i =1;:::; m do Let b1 :::b m be te binary representation of i, 1 For k =1;:::;n do y èkè è Strong-SC Bx èe k ; r1;:::;r m ;b1;:::;b m è End For y è y è1è :::y ènè If EQ x èyè = 1 ten x 0 è y End For Return x 0 Figure 3: Te Recover algoritm tat attempts to compute x. for a majority of te values of i ë m ë. Tus, taking a majority vote over te values of B x èz + RëS i ëè, bës i ëasi=1;:::; m will yield a bit tat wit ig probability equals x; zi. Once we ave an algoritm tat wit ig enoug probability determines x; zi for a given z, we can compute x as before. Namely we would call tis algoritm on e1;:::;e n and tus retrieve x bit by bit. Tere are several issues to be dealt wit in taking tis ig-level picture into an actual algoritm to recover x. First, we must pin down wat we mean by ëig enoug" probabilities in te above, and analyze te Strong-SC algoritm to see tat it accomplises its task wit suc probabilities. Second we ave te issue of te bits b1;:::;b m tat above we assumed magically to be te ërigt" ones. Let's deal wit te second issue ærst. It is in solving tis tat we make use of te second oracle EQ x wic, recall, tells us weter a given input is te idden x or not. So far we ave not used tis. Te full recovery algoritm is depicted in Figure 3. We begin by picking r1;:::;r m at random. Te key point is tat m = Oèlg nè. So tere are only polynomially many vectors b1;:::;b m to consider. We simply try tem all. For eac coice of te vector b1;:::;b m we run te Strong-SC algoritm n times, on te inputs e1;:::;e n, to generate candidates for te bits of x. Eac candidate x is tested using EQ x. Some coice of b1;:::;b m is correct meaning b j = x; r j i for j =1;:::;m so in tat iteration of te loop we ænd x. Notice te crucial role of te testing oracle EQ x. Had tat not been present, we would ave m candidates for x but no way to telling wic of tese is te rigt one. Te main claim for te analysis tus reduces to a claim about te Strong-SC algoritm wen it gets te rigt coice of te auxiliary bits. In tat case we can upper bound te probability tat it fails to compute x; zi as sown in te next lemma. Note te algoritm itself is deterministic; te only random coice below isr=èr1;:::;r m è.
6 Bellare Lemma 1 Let M = m. Ten for any z f0;1g n weave Pr Strong-SC Bx èz; r1;:::;r m ;x; r1i;:::;x; r m iè 6= x; zi : r1;:::;r m R èf0;1g ni 1 Mæ : We will prove tis lemma later. Given tis we can easily estimate te failure probability of te Recover algoritm. Te coin tosses ere are tose of te algoritm itself. Lemma Let M = m. Ten Pr Recover Bx;EQx è1 n è 6= x i n Mæ : Proof of Lemma : Due to te loop considering all possible values of b1;:::;b m we need only consider te case were b j = x; r j i for j =1;:::;m. In tat case te Recover algoritm invokes Strong-SC a total of n times, using n diæerent values of z but always te same values of r1;:::;r m and b1;:::;b m. Te probability tat any of tese calls returns te wrong answer is at most te sum over k =1;:::;n of te probability tat tat te k-t call returns te wrong answer. But te probability of a wrong answer on any call is bounded as per Lemma 1. Evaluating te complexity of te above procedure yields te following conclusion. Teorem 3 Let m be a parameter and M = m. Ten tere is an algoritm A wic makes at most q B = nm calls to its B x oracle, at most q E = M calls to its EQ x oracle, as time-complexity èexecution time plus size of codeè at most t = OènM è and success probability at least 1, æ were æ = næ, =M. To get success probability of 1= we would set M = næ,. In tat case m = lgèmè = lgènè + logèæ,1 è+1. Te running time of A is Oèn 3 æ,4 è and q B = Oèn æ, è and q E = Oènæ, è. Wat remains is to prove Lemma 1. Tat's te bulk of te work. We will ærst sketc te main ideas. Ten we will stop and recall some probability teory, and use tat to conclude te proof. We will deæne a random variable X i for i ëmë tat takes te value 1 wen te value of B x èz + RëS i ëè, bës i ë is correct, meaning equals x; zi. èunder te assumption tat b1;:::;b m are correct.è Te random variables X1;:::;X M are not independent. However, tey satisfy a certain limited type of independence: tey are pairwise independent. Tis means tat aving te value of one of tem doesn't elp predict te value of anoter, even toug aving te value of two of tem migt elp to predict oters. Tis pairwise independent property is enoug to prove Lemma 1 using Cebysev's inequality. To do all tis we need to step back and recall some probability teory. Deænition 4 Let X1;:::;X M : S! R be real-valued functions on some sample space S. Te latter is equipped wit a probability distribution under wic X1;:::;X M are viewed as random variables. We say tat X1;:::;X M are pairwise independent if for every i; j ëmë wit i 6= j and every a; b R we ave Pr ë X i = a and X j = b ë = Pr ë X i = a ë æ Pr ë X j = b ë :
Goldreic-Levin Teorem 7 To bring tis into context, ere's ow we set up te random variables for te proof of Lemma 1. Let S be te set of all m-element sequences wit entries from f0; 1g n. Put a uniform distribution on S. ètat corresponds to picking r1;:::;r m at random.è Now for i =1;:::;M deæne X i : S!f0;1g as follows, on any input R =èr1;:::;r m è Sí è P 1 if Bx èz + RëS i ëè, js X i èrè= i x; r j i = x; zi 0 oterwise. Tis can be simpliæed by noting tat te equality is true exactly wen B x èz+rës i ëè = x; z+rës i ëi, wic in turn appens exactly wen z + RëS i ë falls in te good set of inputs. Tus X i èrè = è 1 if z + RëSi ë Gd 0 oterwise èè Our claim is tat te random variables X1;:::;X M are pairwise independent. Wy? If S i 6= S j ten tere is some string r k tat belongs to one but not te oter. Now given tat operations are modulo two, a sum involving r k is unpredictable from a sum not involving r k. So if we know tat z + RëS i ë is in Gd, we still do not know weter z + RëS j ë is in Gdí given z + RëS i ë, te value of z + RëS j ë is still uniformly distributed. You sould probably play around a bit to convince yourself of tis claim tat X1;:::;X M are pairwise independent, but tis is te main idea. Now let's go back to te general probability teory. Recall tat if Y is a random variable ten its variance is Var ëy ë=e æ èy,è æ =E æ Y æ, were = E ëy ë is te expectation of Y. Lemma 5 Let X1;:::;X M : S!R be pairwise independent random variables. Ten Var ëx1 + æææ+x M ë =Var ëx1ë+æææ+var ëx M ë : Proof of Lemma 5: Var ëx1 + æææ+x M ë = E Use te formula for te variance and te linearity of expectation to get èx1+æææ+x M è i,eëx1+æææ+x M ë = EëèX1 + æææ+x M èèx1 + æææ+x m èë, èe ëx1ë+æææ+eëx M ëè = E P i;j X ix j i, X i;j E ëx i ë æ E ëx j ë = X i;j = X i = X i E ëx i X j ë, X i;j E ëx i ë æ E ëx j ë E X i X X i + E ëx i X j ë, i6=j i E X i i, E ëx i ë X + èe ëx i X j ë, E ëx i ë æ E ëx j ëè i6=j E ëx i ë, X i6=j E ëx i ë æ E ëx j ë = X i Var ëx i ë+ X i6=j èeëx i X j ë,eëx i ëæeëx j ëè : Te pairwise independence means tat E ëx i X j ë=eëx i ëæeëx j ë wenever i 6= j. Tus te second sum above is zero, and we are done.
8 Bellare Lemma 6 Let X1;:::;X M : S!Rbe pairwise independent random variables, let X = X1 +æææ+ X M, let Aé0 be a real number, and let = E ëx1ë+æææ+eëx M ë. Ten Pr ë jx, j éaë Var ëx 1ë+æææ+Var ëx M ë A : Proof of Lemma 6: Cebysev's inequality tells us tat Now apply Lemma 5. Pr ë jx, j éaë Var ëxë A : Tat's it. Now we use Lemma 6. Recall tat in Equation èè above we deæned te random variables X1;:::;X M : S!f0;1gtat we need for te proof of Lemma 1, and said tat tey were pairwise independent. Now observe tat E ëx i ë = 1 æ Pr ë X i =1ë+0æPr ë X i =0ë = Pr ë X i =1ë = Pr ë z + RëS i ë Gd ë = 1+æ Tis is true because RëS i ë is uniformly distributed in f0; 1g n. Now Var ëx i ë = E X i i, E ëx i ë : = E ëx i ë, E ëx i ë = E ëx i ë æ è1, E ëx i ëè = 1+æ = 1,æ 4 æ 1,æ Let X = X1 + æææ+x M and = E ëxë. Linearity of expectation tells us tat = Mè1 + æè=. Ten observe tat te probability tat we want to bound in Lemma 1 is exactly Pr ë XéM=ë Pr : jx, j é Mæ Var ëx 1ë+æææ+Var ëx M ë èmæ=è = Mè1, æ è=4 M æ =4 1 Mæ as desired. Tat concludes te proof of Lemma 1.
Goldreic-Levin Teorem 9 Acknowledgments Tanks to Ramaratnam Venkatesan for pointers and comments. References ë1ë W. Alexi, B. Cor, O. Goldreic and C. Scnorr, ërsa and Rabin Functions: Certain Parts Are as Hard as te Wole," SIAM J. on Computing, Vol. 17, No., 1988, pp. 194í09. ëë M. Blum, M. Luby and R. Rubinfeld, ëself-testingècorrecting wit applications to numerical problems," Journal of Computer and System Sciences, Vol. 47, 1993, pp. 549í595. ë3ë O. Goldreic, ëtree XOR lemmas: An exposition," Manuscript available at ttp:èèwww. wisdom.weizmann.ac.ilèusersèodedèpapers.tml. See Capter 3. ë4ë O. Goldreic, Modern cryptograpy, probabilistic proofs and pseudorandomness, Springer, 1999. See Appendix C.. ë5ë O. Goldreic and L. Levin, ëa ard predicate for all one-way functions," Proceedings of te 1st Annual Symposium on te Teory of Computing, ACM, 1989. ë6ë L. Levin, ërandomness and non-determinism," Manuscript available at ttp:èèwww.cs.bu. eduèfacèlndèresearcèpubl.tml. ë7ë M. Luby, Pseudorandomness and cryptograpic applications, Princeton Computer Science Notes, 1996.