CS85: You Can t Do That (Lower Bounds in Computer Science) Lecture Notes, Spring 2008. Amit Chakrabarti Dartmouth College

Similar documents

5 Boolean Decision Trees (February 11)

Soving Recurrence Relations

Asymptotic Growth of Functions

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Department of Computer Science, University of Otago

Infinite Sequences and Series

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Chapter 7 Methods of Finding Estimators

Sequences and Series

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

CS103X: Discrete Structures Homework 4 Solutions

Output Analysis (2, Chapters 10 &11 Law)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Convexity, Inequalities, and Norms

The Stable Marriage Problem

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

I. Chi-squared Distributions

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Hypothesis testing. Null and alternative hypotheses

Modified Line Search Method for Global Optimization

Properties of MLE: consistency, asymptotic normality. Fisher information.

Irreducible polynomials with consecutive zero coefficients

A probabilistic proof of a binomial identity

1. MATHEMATICAL INDUCTION

Chapter 5: Inner Product Spaces

Notes on exponential generating functions and structures.

Solutions to Exercises Chapter 4: Recurrence relations and generating functions

3 Basic Definitions of Probability Theory

Permutations, the Parity Theorem, and Determinants

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

Factors of sums of powers of binomial coefficients

Class Meeting # 16: The Fourier Transform on R n

THE HEIGHT OF q-binary SEARCH TREES

Lecture 2: Karger s Min Cut Algorithm

INFINITE SERIES KEITH CONRAD

THE ABRACADABRA PROBLEM

Incremental calculation of weighted mean and variance

CHAPTER 3 THE TIME VALUE OF MONEY

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Lesson 15 ANOVA (analysis of variance)

CHAPTER 3 DIGITAL CODING OF SIGNALS

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Perfect Packing Theorems and the Average-Case Behavior of Optimal and Online Bin Packing

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

2-3 The Remainder and Factor Theorems

MARTINGALES AND A BASIC APPLICATION

Math C067 Sampling Distributions

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

The Power of Free Branching in a General Model of Backtracking and Dynamic Programming Algorithms

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

Lesson 17 Pearson s Correlation Coefficient

2. Degree Sequences. 2.1 Degree Sequences

1. C. The formula for the confidence interval for a population mean is: x t, which was

Lecture 5: Span, linear independence, bases, and dimension

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

NATIONAL SENIOR CERTIFICATE GRADE 12

Measures of Spread and Boxplots Discrete Math, Section 9.4

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Estimating Probability Distributions by Observing Betting Practices

Overview of some probability distributions.

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Basic Elements of Arithmetic Sequences and Series

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

. P. 4.3 Basic feasible solutions and vertices of polyhedra. x 1. x 2

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

How To Understand The Theory Of Coectedess

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

Analysis Notes (only a draft, and the first one!)

Tradigms of Astundithi and Toyota

BINOMIAL EXPANSIONS In this section. Some Examples. Obtaining the Coefficients

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

Chapter 14 Nonparametric Statistics

Lecture 4: Cheeger s Inequality

1 Computing the Standard Deviation of Sample Means

3. Greatest Common Divisor - Least Common Multiple

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Your organization has a Class B IP address of Before you implement subnetting, the Network ID and Host ID are divided as follows:

Factoring x n 1: cyclotomic and Aurifeuillian polynomials Paul Garrett <garrett@math.umn.edu>

Overview on S-Box Design Principles

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

A RANDOM PERMUTATION MODEL ARISING IN CHEMISTRY

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

Concept: Types of algorithms

Ramsey-type theorems with forbidden subgraphs

Exploratory Data Analysis

Transcription:

CS85: You Ca t Do That () Lecture Notes, Sprig 2008 Amit Chakrabarti Dartmouth College Latest Update: May 9, 2008

Lecture 1 Compariso Trees: Sortig ad Selectio Scribe: William Che 1.1 Sortig Defiitio 1.1.1 (The Sortig Problem). Give items (x 1, x 2,, x ) from a ordered uiverse, output a permutatio σ : [] [] such that x σ (1) < x σ (2) < x σ (). At preset we do ot kow how to prove superliear lower bouds for the Turig machie model for ay problem i NP, so we istead restrict ourselves to a more limited model of computatio, the compariso tree. We begi by cosiderig determiistic sortig algorithms. Determiistic Sortig Algorithms Whereas the Turig machie is allowed to move heads, read ad write symbols at every time step, i a compariso tree T we are oly allowed to make comparisos ad read the result. Without loss of geerality we ll assume that the x i s are distict, so every compariso results i either a < or >. At each iteral ode of T we make a compariso, the result of which gives tells us the ext compariso to make by directig us to either the left or the right child. At each leaf ode we must have eough iformatio to be able to output the sorted list of items. The cost of a sortig algorithm i this model is the just the height of such a compariso tree T, i.e., the maximum depth of a leaf of T. Now ote that T is i fact a biary tree, so we have l 2 h, h lg l, where l is the umber of leaves, ad h is the height of T. But ow we ote that sice there are! possible permutatios, ay correct tree must have at least! leaves, so the, by Stirlig s formula, we have h lg! = ( lg ). Ope Problem. Prove that some laguage L NP caot be decided i O() time o a Turig machie. Radomized Sortig Algorithms Now we tur our attetio to radomized compariso-based sortig algorithms ad prove that i this case access to a costat time radom umber geerator does ot give us ay additioal power. That is to say, it does ot chage our lower boud. A example of such a algorithm is the variat of quicksort where, at each recursive call, the pivot is 1

LECTURE 1. COMPARISON TREES: SORTING AND SELECTION chose uiformly at radom. Note that the radomess i ay radomized algorithm ca be captured as a ifiite biary radom strig, chose at the begiig, such that at every call of the radom umber geerator, we simply read off the ext bit (or the ext few bits) o the radom strig. The, we ca say that a radomized algorithm is just a probability distributio over determiistic algorithms (each obtaied by fixig the radom strig). I other words, whe callig the radomized algorithm, oe ca move all the radomess to the begiig, ad simply pick at radom a determiistic algorithm to call. With this view of radomized algorithms i mid, we describe a extremely useful lemma due to Adrew Chi-Chih Yao (1979). But first, some prelimiaries. Suppose we have some otio of cost of a algorithm (for some fuctio/problem f ) o a iput, i.e., a (oegative) fuctio cost : A X R +, where A is a space of (determiistic) algorithms ad X is a space of iputs. A example of a useful cost measure is ruig time. I what follows, we will cosider radom algorithms A λ, where λ is a probability distributio o A. Note that, per our above remarks, a radomized algorithm is specified by such a distributio λ. We will also cosider a radom iput X µ, for some distributio µ o X. Defiitio 1.1.2 (Distributioal, worst-case ad radomized complexity). I the above settig, the term E µ [cost(a, X)] is the just the expected cost of a particular algorithm a A o a radom iput X µ. The quatity D µ ( f ) = mi a E µ [cost(a, X)], where f is the fuctio we wat to compute, is called the distributioal complexity of f accordig to iput distributio µ. We costrast this to the worst case complexity of f, which we ca write as follows C( f ) = mi a max cost(a, x). x Fially, we defie R( f ), the radomized complexity of f as follows: R( f ) = mi λ From the defiitios above it follows easily that max E λ [cost(a, x)]. x µ D µ ( f ) C( f ) ad R( f ) C( f ). To obtai the latter iequality, cosider the trivial distributios λ that are each supported o a sigle algorithm a A. Theorem 1.1.3 (Yao s Miimax Lemma). We have 1. (The Easy Half) For all iput distributios µ, D µ ( f ) R( f ). 2. (The Difficult Half) max µ { Dµ ( f ) } = R( f ). The proof of part (2) is somewhat o-trivial ad requires the use of the liear programmig duality theorem. Moreover, we do ot eed part (2) at this poit, so we shall oly prove part (1). We ote i passig that part (2) ca be writte as max mi E µ [cost(a, X)] = mi max E λ [cost(a, x)]. µ a λ x Proof of Part (1). To visualize this proof, it helps to cosider the followig table, where X = {x 1, x 2, x 3,...}, A = {a 1, a 2, a 3,...} ad c i j = cost(a i, x j ): a 1 a 2 a 3 x 1 c 11 c 12 c 13 x 2 c 21 c 22 c 23 x 3 c 31 c 32 c 33....... 2

LECTURE 1. COMPARISON TREES: SORTING AND SELECTION While readig the argumet below, thik of row averages ad colum averages i the above table. Defie R λ ( f ) = max x E λ [cost(a, x)]. The, for all x, we have whece E λ [cost(a, x)] R λ ( f ), E µ [E λ [cost(a, X)]] R λ ( f ). Sice expectatio is essetially just summatio, we ca switch the order of summatio to get E λ [ Eµ [cost(a, X)] ] R λ ( f ), which implies that there must be a algorithm a such that E µ [cost(a, X)] R λ ( f ). Sice the distributioal complexity D µ ( f ) takes the miimum such a, it s clear that D µ ( f ) R λ ( f ). Sice this holds for ay λ, we have D µ ( f ) mi λ R λ ( f ) = R( f ). Before we returig to the issue of a lower boud for radomized sortig, we itroduce aother lemma. Lemma 1.1.4 (Markov s Iequality). If X is a radom variable takig oly oegative values, the for all t > 0, we have Pr[X t] E[X]/t. Proof. A easy oe-lier: E[X] = E[X X < t] Pr[X < t] + E[X X t] Pr[X t] 0 + t Pr[X t]. Corollary 1.1.5. Uder the above coditios, for ay costat c > 0, Pr[X > c E[X]] 1/c. Let m deote the umber of comparisos made o ay particular ru, or equivaletly its rutime. Cosider a radomized sortig algorithm a (i.e., a distributio over compariso trees), where we set C = max iput {E λ[m]} ote here that m is a fuctio of both the algorithm ad the iput. Theorem 1.1.6 (Radomized sortig lower boud). Let T (x) deote the expected umber of comparisos made by a radomized -elemet sortig algorithm o iput x. Let C = max x T (x). The C = ( lg ). Proof. Let A deote the space of all (determiistic) compariso trees that sort a -elemet array ad let X deote the space of all permutatios of []. Cosider a radomized sortig algorithm give by a probability distributio λ o A. As it stads, a radomized sortig algorithm is always correct o every iput, but has a radom rutime (betwee () ad O( 2 )) that depeds o its iput. (Such algorithms are called Las Vegas algorithms.) We wat to covert each a A ito a correspodig algorithm a that has a otrivially bouded rutime, but may sometimes spits out garbage. We do this by allowig at most 10C comparisos, ad outputtig some garbage if the sorted permutatio is still ukow. Let A deote a radom algorithm correspodig to a radom A λ. Corollary 1.1.5 implies Now defie We ca the rewrite the above iequality as Pr[A is wrog o x] = Pr[NumComps(A, x) > 10C] 1 10. { 1, if a is wrog o iput x cost(a, x) = 0, otherwise max E λ [cost(a, x)] 1 x 10. 3

LECTURE 1. COMPARISON TREES: SORTING AND SELECTION By Yao s miimax lemma, for all iput distributios µ, we have D µ (sortig with 10C comparisos) 1 10. Now pick µ to be the uiform distributio o X. Pick ay determiistic algorithm a with 10C comparisos that achieves the miimum i the defiitio of D µ. The, for X µ, we have Pr[a is wrog o X] 1/10. Now the height of a s tree is 10C, so the umber of leaves of a s tree is 2 10C, ad thus the umber of distict permutatios that a ca output is 2 10C. But ow we see that Pr[X is oe of the permutatios output by a] > 9 10, so 2 10C (9/10)! ad hece, by Stirlig s formula, C 1 ( ( )) 9 lg 10 10! = ( lg ). 1.2 Selectio Defiitio 1.2.1 (The Selectio Problem). Give items (x 1, x 2,, x ) from a ordered uiverse ad a iteger k [], the selectio problem is to retur the kth smallest elemet, i.e., x i such that { j : x j < x i } = k 1. Let V (, k) deote the height of the shortest compariso tree that solves this problem. Special Cases. The miimum (resp. maximum) selectio problems, where k = 1 (resp. k = ), ad the media problem, where k = 2. Miimum ad maximum ca clearly be doe i O() time. Other selectio problems are less trivial. 1.2.1 Miimum Selectio Theorem 1.2.2. Let T be a compariso tree that fids the miimum of elemets. The every leaf of T has depth at least 1. Sice T is biary, it must therefore have at least 2 1 leaves. Proof. Cosider ay leaf λ of T. If x i is the output at λ, the every x j ( j i) must have wo at least oe compariso o the path leadig to λ (if ot, we caot kow for sure that x j is ot the miimum). Sice each compariso ca be wo by at most oe elemet, we have depth(λ) 1. Corollary 1.2.3. V (, 1) = 1. 1.2.2 Geeral Selectio By a adversarial argumet, Hyafil [Hya76] proved that V (, k) k + (k 1) lg k 1. This was stregtheed by Fusseegger ad Gabow [FG79] to the boud stated i the ext theorem. Both these results give poor bouds for k. However, observe that V (, k) = V (, k). Note. For the special case where k = 2, we have V (, 2) = 2 + lg. It is a good exercise to prove this: both the upper ad the lower boud are iterestig. Also, k {1, 2, 1, } are the oly values for which we kow V (, k) exactly. Theorem 1.2.4 (Fusseegger & Gabow). V (, k) >= k + lg ( k 1). 4

LECTURE 1. COMPARISON TREES: SORTING AND SELECTION Proof. Let T be a compariso tree for selectig the kth smallest elemet. At a leaf λ of T, if we output x i, the every x j ( j i) must be settled with respect to x i, i.e., we must kow whether or ot x j < x i. I particular, we must kow the set S λ = {x j : x j < x i }. Fix a particular set U {x 1,..., x }, with U = k 1. Now cosider the followig set of leaves: L U = {λ : S λ = U}. Notice that the output at ay leaf i L U is the miimum of the elemets i U. Therefore, if we treat U as give, ad prue T accordigly, by elimiatig ay odes that perform comparisos ivolvig oe or more elemets of U, the the residual tree is oe that has leaf set L U ad fids the miimum of the k + 1 elemets of U. By Theorem 1.2.2, the prued tree must have at least 2 k leaves. Therefore, L U 2 k. The umber of leaves of T is the sum of L U over all ( k 1) choices for the set U. Therefore, the height of T must be This proves the theorem. 1.2.3 Media Selectio height(t ) (( ) lg k 1 2 k ) = k + ( ) lg. k 1 Turig ow to the media selectio special case, recall that by Stirlig s formula, we have ( ) ( 2 ) =, /2 which already gives us a good itial lower boud for media selectio: ( V, ) ( ) 2 2 + lg 2 1 2 + 1 2 lg O(1) = 3 2 o(). Usig more sophisticated techiques, oe ca prove the followig stroger lower bouds: V (, ) 2 2 o(). [BJ85] V (, ) 2 (2 + 2 50 ) o(). [DZ01] O the upper boud side, we agai have otrivial results, startig with the famous groups of five algorithm: V (, ) 2 6 + o(). [BFP + 73] V (, ) 2 3 + o(). [SPP76] V (, ) 2 2.995. [DZ99] 5

Lecture 2 Boolea Decisio Trees Scribe: William Che 2.1 Defiitios ad Basic Theorems Defiitio 2.1.1 (Decisio Tree Complexity). Let f : {0, 1} {0, 1} be some Boolea fuctio. We sometimes write f whe we wat to emphasize the iput size. A decisio tree T models a computatio that adaptively reads the bits of a iput x, brachig based o each bit read. Thus, at each iteral ode of T, we read a specific bit (idicated by the label of the ode) ad brach left or right depedig upo whether the bit read was a 0 or a 1. At each leaf ode we must have eough iformatio to determie f (x). We defie the determiistic decisio tree complexity D( f ) of f to be D( f ) = mi{height(t ) : T is a decisio tree that evaluates f }. For example, easy adversarial argumets show that for the or, ad ad parity fuctios, we have D(OR) = D(AND) = D(PAR) =. It is coveiet to itroduce a term for fuctios that are maximally hard to evaluate i the decisio tree model. Defiitio 2.1.2 (Evasiveess). A fuctio f : {0, 1} {0, 1} is said to be evasive (a.k.a. elusive) if D( f ) =. We shall try to relate this algorithmically defied quatity D( f ) to several more combiatorial measure of the hardess of f, that we ow defie. Defiitio 2.1.3 (Certificate Complexity). For a fuctio f : {0, 1} {0, 1}, defie a 0-certificate to be a strig α {0, 1, } such that x {0, 1} : x matches α f (x) = 0. That is to say, for ay iput x that matches α for all o- bits, we have f (x) = 0, o matter the settigs of the free variables give by the s. Defie the size of a certificate α to be the umber of exposed (i.e., o- ) bits. The, defie the 0-certificate complexity of f as C 0 ( f ) = max mi{size(α) : α is a 0-certificate that matches x}. x We defie 1-certificates ad C 1 ( f ) similarly. We use the covetio that C 0 ( f ) = if f 1, ad similarly for C 1 ( f ). Fially, we defie the certificate complexity C( f ) of a fuctio f to be the larger of the two, that is C( f ) = max{c 0 ( f ), C 1 ( f )}. Example 2.1.4. C 1 (OR ) = 1, whereas C 0 (OR ) =, so C(OR ) =. The quatity C( f ) is sometimes called the odetermiistic decisio tree complexity. Note that for ay decisio tree for f, ay leaf givig a aswer p {0, 1} must be of depth at least C p ( f ), so we have C( f ) D( f ). 6

LECTURE 2. BOOLEAN DECISION TREES Defiitio 2.1.5 (Sesitivity). For a fuctio f : {0, 1} {0, 1}, defie the sesitivity s x ( f ) for f o a particular iput x as follows s x ( f ) = {i [] : f (x (i) ) f (x)}, where we write x (i) to deote the strig x with the ith bit flipped. We the defie the sesitivity of f as s( f ) = max{s x ( f )}. x Defiitio 2.1.6 (Block Sesitivity). For a block B [] of bit positios ad = x, let x (B) be the strig x with all bits idexed by B flipped. Defie the block sesitivity bs x ( f ) for f o a particular iput x as bs x ( f ) = max{b : {B 1, B 2,..., B b } is a set of disjoit blocks such that i [b] : f (x) f (x (B i ) )}. That is to say, bs x ( f ) is the maximum umber of disjoit blocks that are each sesitive for f o iput x. We the defie the block sesitivity of f as bs( f ) = max bs x ( f ). x Note that sice block sesitivity is just a geeralizatio of sesitivity (which requires that the blocks be of size 1 each), we have bs x ( f ) s x ( f ), ad thus bs( f ) s( f ). Before we go o to develop some of the theory behid this, cosider the followig example. Example 2.1.7. Cosider a fuctio f (x) defied as follows, where the iput x is of legth ad is arraged i a matrix. Assume is a eve iteger. Defie { 1, if some row of the matrix matches 0 f (x) = 110 0, otherwise. Sice each row ca have at most 2 sesitive bits, we have s( f ) = 2 = O( ). However, for block sesitivity we have bs( f ) bs z ( f ) 2 where z = 0 ad we have 2 blocks each cosistig of 2 cosecutive bits, which makes every block sesitive. This gives a quadratic gap betwee s( f ) ad bs( f ). Theorem 2.1.8. s( f ) bs( f ) C( f ) D( f ). Proof. The leftmost ad rightmost iequalities have already bee established above. To see that the middle iequality holds, observe that every certificate must expose at least oe bit per sesitive block. Here we ote that there could be at least a quadratic gap betwee s( f ) ad bs( f ) as evideced by example 2.1.7. It turs out that there is a maximum of a quadratic gap betwee C( f ) ad D( f ), which leads us to the ext theorem, whose proof we will see as a corollary later i this sectio. Theorem 2.1.9 (Blum-Impagliazzo). D( f ) C( f ) 2 Example 2.1.10. Cosider a fuctio g = AND OR. That is to say, g divides the iput x with x = ito groups, feeds each group ito the subfuctio OR, ad the feeds the results of each OR ito AND. To certify 0, we eed oly expose all of the iputs i oe OR-subtree to show that they are all 0. To certify 1, we eed oly expose oe iput i each of the OR-subtrees to show that there is at least a sigle 1 goig ito every OR. The, we have C(g) = however, it s easy to see that give ay subset of the iput, the fial aswer still depeds o the rest of the iput, ad thus D(g) =. Therefore, g is evasive. 7

LECTURE 2. BOOLEAN DECISION TREES 2.1.1 Degree of a Boolea Fuctio Defiitio 2.1.11 (Represetative Polyomial). For a fuctio f : {0, 1} {0, 1}, we say that a polyomial p(x 1, x 2,..., x ) represets f if for all a {0, 1}, we have p( a) = f ( a). Defiitio 2.1.12 (Degree of a Boolea Fuctio). For a fuctio f : {0, 1} {0, 1} ad a represetative polyomial p for f, the degree of f is the just the degree of p. Theorem 2.1.13. For ay such f, there exists a uique multiliear polyomial that represets f. Proof. Let { a 1, a 2,..., a k } be the set of all iputs that make f = 1. The, we claim that for all b, c {0, 1}, there exists a polyomial p b (x 1,..., x ) such that { 1 if c = pb ( c) = b 0 otherwise To see this, for ay such vector b, cosider the polyomial p b (x 1,..., x ) = (x i + b i 1)) Clearly this polyomial is 1 if ad oly if every x i = b i. The, it follows that the polyomial that represets f is just the polyomial k p = i=1 p ai i=1 Sice each a i is distict, at most oe p ai will evaluate to 1, but sice that implies that the iput is oe of the acceptig iputs for f, p behaves as desired. To prove uiqueess, suppose 2 multiliear polyomials p, q both represet f, the the polyomial r = p q satisfies r( a) = 0 for all a {0, 1}. Cosider the smallest moomial x i1, x i2,..., x ik i r. FINISH UNIQUENESS PROOF. Theorem 2.1.14. For a Boolea fuctio f, let p be its represetative polyomial. The we have deg(p) D( f ) Proof. For ay leaf λ of a decisio tree, without loss of geerality let x 1, x 2,..., x r be the queries made alog the path to that leaf, ad let b 1, b 2,..., b r be the results of the queries. The for every leaf λ of a decisio tree for f, write a polyomial p λ give as follows p λ = x i (x i 1) i:b i =1 Clearly each p λ has degree r D( f ), the the polyomial i:b i =0 p = λ p λ represets f ad also has degree D( f ). Example 2.1.15. We preset a example to show that deg( f ) ca be much smaller tha s( f ). Cosider the fuctio o a iput vector x with size { NAE(R-NAE( x 1,..., x R-NAE( x) = 3 ), R-NAE( x 3,..., x 2 3 ), R-NAE( x 2,..., x )) if x > 3 3 NAE( x 1, x 2, x 3 ) if x = 3 where we assume that = 3 k for some iteger k, ad NAE is defied as follows { 1 if x, y, z are ot all equal NAE(x, y, z) = 0 if x = y = z 8

LECTURE 2. BOOLEAN DECISION TREES I other words, this is the k-layer tree of NAE computatios applied recursively to 3 k iitial iputs. For x = 0, we have s x (R-NAE) = 3 k, sice chagig ay sigle bit will flip the aswer. Now observe that we ca alteratively write the NAE(x, y, z) fuctio as a degree 2 polyomial NAE(x, y, z) x + y + z xy yz zx The, we ca see that at the secod lowest level we have 3 k 1 polyomials, each of degree 2 (lowest level cotais 3 k polyomials of degree 1 - coordiates of the iput vector), but at the root, we have a sigle polyomial of degree 2 k, but this is polyomially distict to the sesitivity s x (R-NAE) = 3 k. Alteratively, writig this i terms of, we have deg( f ) = log 3 2 s( f ) = Now returig to the example g = AND OR, we have deg( f ) = s( f ) = C( f ) To see why deg( f ) =, simply recall that o matter the iput, we must always examie every bit of the iput to get a defiitive aswer. I other words, the decisio tree has costat height D(g) = (refer to example 2.1.10). Theorem 2.1.16 (Beals-Buhrma-Cleve-Marca-deWolf). D( f ) C 1 ( f )bs( f ) C 0 ( f )bs( f ) Proof. Without loss of geerality we oly cosider the first case, that D( f ) C 1 ( f )bs( f ), as the case for C 0 ( f )bs( f ) is symmetric. We proceed to iduct o the umber of variables, or size of the iput. The base case = 1 is trivial. Before we cotiue with the iductio step, we defie a subfuctio of f as follows f α : {0, 1} k {0, 1}, with bits i α set to their values i α where α {0, 1, } ad exposes k bits, or has exactly k characters {0, 1}. If f has o 1-certificate, the f = 0, ad D( f ) = 0, so we re doe. FINISH BEALS BUHRMAN PROOF Theorem 2.1.17 (Nisa). C( f ) s( f )bs( f ) Proof. Try it yourself. Corollary 2.1.18. D( f ) C( f )bs( f ) s( f )bs( f ) 2 bs( f ) 3 Proof. Direct result of theorems 2.1.17 ad 2.1.16. Corollary 2.1.19. D( f ) C( f ) 2 Proof. D( f ) mi{c 0 ( f ), C 1 ( f )} bs( f ) mi{c 0 ( f ), C 1 ( f )} C( f ) = mi{c 0 ( f ), C 1 ( f )} max{c 0 ( f ), C 1 ( f )} = C 0 ( f ) C 1 ( f ) C( f ) 2 9

Lecture 3 Degree of a Boolea Fuctio Scribe: Ragaath Kodapally Theorems: 1. S( f ) bs( f ) C( f ) D( f ) 2. deg( f ) D( f ) 3. C( f ) S( f )bs( f ) bs( f ) 2 [Nisa] 4. D( f ) C 1 ( f )bs( f ) [Beals et al] 5. Corollary: D( f ) C 0 ( f )C 1 ( f ) C( f ) 2 [Blum-Impagliazzo] 6. Corollary: D( f ) bs( f ) 3 7. bs( f ) 2 deg( f ) 2 8. D( f ) bs( f ) deg( f ) 2 2 deg( f ) 4 Examples: 1. f : S( f ) = ( ), bs( f ) = () 2. f : S( f ) = bs( f ) = C( f ) =, deg( f ) = D( f ) = 3. f : deg( f ) = log 3 2, S( f ) = Theorem 7: Proof. Pick a iput a ( {0, 1} ) ad blocks B 1, B 2,..., B b that achieve the max i bs( f ) = b. Let P(x 1,..., x ) be a multi-liear polyomial represetatio of f. Create a ew polyomial Q(y 1, y 2,..., y b ) from P by settig x i s as follows If i B 1 B 2... B b, set x i = a i 10

LECTURE 3. DEGREE OF A BOOLEAN FUNCTION If i B j, If a i = 1, set x i = y j If a i = 0, set x i = 1 y j Multi-liearize the polyomial deg(q) deg(p) (Note: deg(q) b) Q(1, 1, 1,..., 1) = P( a ) = f ( a ) Q(0, 1, 1,..., 1) = Q(1, 0, 1,..., 1) = Q(1, 1,..., 0) = 1 f ( a ) (Because each B j is sesitive) Q( c ) {0, 1} c {0, 1} b [Trick by Misky-Papert] Defie Q sym (k) as follows: Q sym (k) = a =k Q( a ) ( b k), k {0, 1, 2,..., b} where a (weight of a ) is the umber of oes i a. Note: deg(q sym ) b These defiitios defie Q sym uiquely. Markov s Iequality: Q sym (k) = f ( a ), if k = b = 1 f ( a ), if k = b 1 [0, 1], else Defiitio 3.0.20 (Iterval Norm of a fuctio). Defie f S = sup x S f (x), for f : R R, S R. Suppose f is a polyomial of degree, the f [ 1,1] 2 f [ 1,1] Corollary: If a x b, c f (x) d, the f [a,b] 2 b a d c Let λ = (Q sym ) [0,b] Let κ = max{(sup Q sym (x)) 1, if Q sym (x), 0} The Q sym (x), for x [0, b] maps to [ k, 1 + k]. If κ = 0, Q sym maps [0, b] to [0, 1]. Markov s Iequality implies (Q sym ) [0,b] deg(qsym ) 2 (1 0) (b 0) deg(qsym ) 2 bs( f ) However, Q sym (b 1) = 1 f ( a ) ad Q sym (b) = f ( a ). By Mea-Value theorem, α [b 1, b] such that (Q sym ) (α) = 1. so, (Q sym ) [0,b] 1 bs( f ) deg(q sym ) 2 If κ > 0 If κ = Q sym (x) 1 for some x = x 0. Suppose x 0 [i, i + 1]. Cosiderig the smaller of the two itervals [i, x 0 ], [x 0, i + 1] ad applyig mea-value theorem, (Q sym ) [0,b] κ 1/2 = 2κ 11

LECTURE 3. DEGREE OF A BOOLEAN FUNCTION Q sym maps [0, b] to [ k, 1 + k]. Markov s Iequality implies, 2κ (Q sym ) [0,b] deg(qsym ) 2 (1 + κ ( κ)) b 0 So, 2κ deg(qsym ) 2 (1 + 2κ) bs( f ) bs( f ) deg(q sym ) 2 ( 1 + 1) (1) 2κ Usig α such that (Q sym ) (α) = 1, we have 1 (Q sym ) [0,b] deg(qsym ) 2 (1 + 2κ) bs( f ) bs( f ) deg(q sym ) 2 (1 + 2κ) (2) (1)&(2) bs( f ) deg(q sym ) 2 (1 + mi{ 1 2κ, 2κ}) 2 deg(q sym ) 2 2 deg(q) 2 2 deg(p) 2 = 2 deg( f ) 2 The proof is similar for the case where κ = Q sym (x) for some x. Theorem 8: D( f ) bs( f ) deg( f ) 2 Proof. Cosider polyomial represetatio P of f. Defiitio 3.0.21. Defie maxoomial = maximum degree moomial of f There exists a set (size bs( f ) deg( f )) of variables that itersects every maxoomial. Query all these variables to get subfuctio f α with deg( f α ) deg( f ) 1. Sice all the maxoomials of f vaish after exposig the variables, the degree of the resultig polyomial represetatio ( f α ) decreases at least by 1. If the above two statemets hold, the we ca prove the theorem by iductio o the degree of the fuctio f. If f is a costat fuctio, the the theorem holds trivially. Assume that it is true for all fuctios whose degree is less tha = deg( f ). D( f ) bs( f ) deg( f ) + D( f α ) The algorithm to get those variables is bs( f ) deg( f ) + bs( f α ) deg( f α ) 2 bs( f )(deg( f ) + (deg( f ) 1) 2 ) By iductio hypothesis bs( f )deg( f ) 2 if deg( f ) 1 12

LECTURE 3. DEGREE OF A BOOLEAN FUNCTION Start with a empty set S Pick all variables i ay maxoomial that is ot yet itersected ad add these variables to S. (Number of variables added is less tha or equal to deg( f ).) If the curret set itersects every maxoomial stop else repeat the above step. Claim: Number of iteratios i the algorithm is at most bs( f ) Proof: We claim that iside every maxoomial, there is a sesitive block for 0. Pick ay maxoomial. Set x i (i maxoomial) to 0. The resultig fuctio s polyomial represetatio still cotais the maxoomial. Thus, the resultig boolea fuctio is ot a costat. This implies that there exists a sub-block iside the maxoomial such that flippig it chages the value of the fuctio. Thus, the umber of maxoomials is at most bs( f ). Hece, the umber of iteratios which ca be at most the umber of maxoomials is at most bs( f ). 13

Lecture 4 Symmetric Fuctios ad Mootoe Fuctios Scribe: Ragaath Kodapally Defiitio 4.0.22. f : {0, 1} {0, 1} is said to be symmetric if x {0, 1} π S (Group of all permutatios o []) f (x) = f (x π(1), x π(2),..., x π() ) This is the same as sayig f (x) depeds oly o x (umber of oes i x). Recall the proof of bs( f ) 2 deg( f ) 2. We used symmetrizig trick. Ca prove: For ay symmetric f, deg( f ) = O( α ) where α(= 0.548) < 1 is a costat. [Vo Zur Gathe & Roche] The, D( f ) deg( f ) = O() For x, y {0, 1} write x y if i(x i y i ) Defiitio 4.0.23. f is mootoe if x y f (x) f (y) Example of mootoe fuctios: O R, AN D, AN D O R. No-examples: P AR, O R. Theorem 4.0.24. If f is mootoe, the S( f ) = bs( f ) = C( f ) Proof. We already have S( f ) bs( f ) C( f ). Remais to prove: C( f ) S( f ). Let x {0, 1}. Let α be a miimal 0-certificate matchig x (suppose f (x) = 0). Claim: All exposed bits of α are zeros. Proof: If ot, we ca chage a exposed bit (which is 1) i α ad still have f (x α ) = 0 as f is mootoe. So, we do t eed to expose that bit. So, α ca t be miimal 0-certificate. Claim: Every exposed bit i α is sesitive for [α : 1 ] = y (the iput obtaied by settig *=1 everywhere i α) Proof: Suppose i th bit is exposed by α but f (y (i) ) = f (y) = 0. By mootoicity, ay other settig of * s i α causes f = 0. So, i th bit is ot ecessary, cotradictig the miimality of α. By secod claim: S( f ) umber of exposed bits i α. Thus, S( f ) C 0 ( f ). Similarly, S( f ) C 1 ( f ). Therefore, S( f ) max{c 0 ( f ), C 1 ( f )} = C( f ) Theorem 4.0.25. If f is mootoe, D( f ) bs( f ) 2 = S( f ) 2 [Usig a earlier theorem: D( f ) C( f )bs( f ) = S( f ) 2 ] The above bouds are tight, cosiderig f = AN D O R. 14

LECTURE 4. SYMMETRIC FUNCTIONS AND MONOTONE FUNCTIONS The oly possible symmetric mootoe fuctios are T H R,k where T H R,k (x) =1, if x k D(T H R,k ) = (k 0) (by a simple adversarial argumet). Defiitio 4.0.26. f : {0, 1} {0, 1} is evasive if D( f ) =. 0, Otherwise Defiitio 4.0.27. A boolea fuctio f (x 1,2, x 1,3,..., x 1, ) is a graph property if f ( x ) depeds oly o the graph described by x. Every π S iduces a permutatio o ( []) 2. f is a graph property iff it is ivariat with respect to these permutatios. Theorem 4.0.28. Every mootoe o-costat graph property f o -vertex graph has D( f ) = ( 2 ) [Rivest & Vuillemi] Cojecture: D( f ) = ( 2), i.e f is evasive! [Richard Karp]. Theorem 4.0.29. If = p α for a prime p, α N, the f is evasive. [Kah-Saks-Sturtevat] Theorem 4.0.30. If f is ivariat uder edge-deletio or cotractio (mior closed property) ad 0, the f is evasive. [Amit-Khot-Shi] Theorem 4.0.31. Ay mootoe bipartite graph property is evasive. [Yao] Proof. Let f : {0, 1} N {0, 1} where N = ( 2) be a o-costat mootoe graph property. f is ivariat uder a group G S N of permutatios (G = S ). G is a trasitive o {1,..., N} i.e, for ay i, j [N], π G such that π(i) = j. Lemma 1: Suppose f : {0, 1} d {0, 1} is mootoe ( costat), ivariat uder a trasitive group ad d = 2 α for some α N, the f is evasive. Corollary to Lemma 1: If = 2 k, the D( f ) 2 /4. Hierarchically cluster [] ito subclusters of size /2 ad recurse. Cosider the subfuctio of f obtaied the followig way: g i : A {0, 1} ad g(x) = f (x), where A = {x is a graph o [] where there is a edge betwee l, j if l 1 i 0 k = j 1 i 0 k ad o edge if l 01 i 1 0 k j 01 i 1 0 k } Number of iputs to subfuctio = (2 k 1 ) 2 = 2 /4. By Lemma 1, D( f ) D(subfuctio) = 2 /4. There are k subfuctios, depedig o how deep we take the clusterig: g 1 (stop after oe level of clusterig),g 2,..., g k (go dow to sigleto clusters). g k ( 0 ) = f ( 0 ) = 0 g 1 ( 1 ) = f ( 1 ) = 1 g i ( 1 ) = g i 1 ( 0 ) i such that g i ( 0 ) = 0 ad g i ( 1 ) = 1 [i.e g i is o-costat] Corollary 2: For all N, D( f ) 2 /16 Proof of Lemma 1: Cosider S k = {x {0, 1} : x = k ad f (x) = 1} Defiitio 4.0.32. orb(x) = {y : y is obtaied from x by applyig some π G} S k is partitioed ito orbits. Claim: If k 0, k d( i.e x 0, x 1 ), the every orbit has eve size. Therefore, S k is eve. Number of oes i resultig matrix = k orb(x) = d (#1 s i ay colum) orb(x) = d ( #1 s i ay col) k 15 = 2α (some iteger) k = eve

LECTURE 4. SYMMETRIC FUNCTIONS AND MONOTONE FUNCTIONS as 0 < k < d. By Claim: {x : f (x) = 1} = S 0 + S 1 +... + S d 1 + S d S 0 = 0 as f ( 0 ) = 0 ad S d = 1. = 0 + eve umber + 1 = odd Cosider all x {0, 1} that reach leaf λ of decisio tree at depth < d. A eve umber of x s reach here. Therefore, if all leaves whose output is f (x) = 1 have depth < d, the umber of x : f (x) = 1, is eve. This is a cotradictio ad hece, there exists a leaf whose depth is d. Thus, D( f ) = d ad f is evasive. 16

Lecture 5 Radomized Decisio Tree Complexity: Boolea fuctios We saw this type of complexity earlier for the sortig problem. Scribe: Priya Nataraja Defiitio 5.0.33 (Cost of a radomized decisio tree o iput x). cost(t, x) = umber of bits of x queried by t. Let radomized decisio tree T λ (probability distributio over decisio trees). From Yao s miimax lemma we have: R( f ) D µ ( f ), for ay iput distributio µ. So, it is eough to prove a lower boud o D µ ( f ). Ulike what we did i sortig, here we wo t covert to a Mote Carlo algorithm. We will strictly use Las Vegas algorithm. Theorem 5.0.34. Let f be a o-costat mootoe graph property o N = ( 2) bits, i.e., vertices. The, R( f ) = ( 4/3 ) = (N 2/3 ). This is sayig somethig more ivolved tha the easy thig you ca prove: for ay mootoe f : R( f ) D( f ). Last time we showed that D( f ) 2 /16 = ( 2 ) [Rivest-Vuiellemi]. Yao s cojecture: R( f ) = ( 2 ) = (N). Theorem 5.0.34 is ot the best result kow, we state below (without proof) the best result kow: R( f ) = ( 4/3 log 1/3 ) [Chakrabarti-Khot], this is a stregtheig of Hajal s proof. Today we will prove somethig that implies Theorem 5.0.34. Recall that graph properties are ivariat uder certai permutatios (ot all permutatios) ad these permutatios form a trasitive group. Note: From ow o, umber of iputs = (ot N). Theorem 5.0.35. If f : {0, 1} is ivariat uder a trasitive group of permutatios, the R( f ) = ( 2/3 ). Match the R( f ) boud i Theorem 5.0.34 to that i Theorem 5.0.35! Proof of Theorem 5.0.35 usig probabilistic aalysis. Proof. Defiitio 5.0.36 (Ifluece of a coordiate o a fuctio). Let f : {0, 1} {0, 1}. Defie the ifluece of the i th coordiate o f as: I i ( f ) = Pr x [ f (x) f (x (i) )]. I words: choose a radom x ad what is the chace that flippig the i th bit will chage the output. Note that I i ( f ) looks like s x ( f ). Because of ivariace uder the trasitive group, we have: I 1 ( f ) = I 2 ( f ) = = I ( f ) = I ( f )/, where I ( f ) = i=1 I i ( f ). We will also make use of aother techique: 17

LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS Defiitio 5.0.37 (Great Notatioal Shift). TRUE: 1 1; FALSE: 0 1 So, old x ew 1 2x, ad old 1 y 2 ew y Why are we doig this? Give fuctio f : { 1, 1} { 1, 1}, this is a sig vector istead of a bit vector. I this otatio also we have a uique multiliear polyomial. It is importat to ote that this otatio does ot chage the degree of the multiliear polyomial, though it might chage the coefficiets ad the umber of moomials. Example 5.0.38. AND(x, y, z) = x yz (old) I the ew otatio we have: Notice that the coefficiets of the liear terms = 2. ( ) ( ) ( ) 1 x 1 y 1 z AND(x, y, z) = 1 2 2 2 2 } {{ } ew old } {{ } old ew = 3 4 + x 4 + y 4 + z 4 xy 4 yz 4 xz 4 + xyz 4 Give that x takes ±1 values, we ca say somethig specific about I i ( f ): If f : { 1, 1} { 1, 1}, the: [ ] 1 I i ( f ) = E x 2 f (x x i = 1) f (x x i = 1) = 1 2 E x[ D i f (x) ] Let φ(x, y, z) = 3 4 + x 4 + y 4 + 4 z xy 4 yz 4 xz 4 + xyz 4 D 1 φ(y, z) = 2. 1 4 + higher degree terms E y,z [D 1 φ(y, z)] = 2. 1 4 + 0 1 2 E[φ(y, z)] = coefficiet of x For mootoe f : I i ( f ) = 1 2 E x[d i f (x)] = coefficiet of x i i polyomial represetatio of f Lemma1: For ay f : { 1, 1} { 1, 1} ad determiistic decisio tree T evaluatig f, we have: Var x [ f ] i=1 δ i I i ( f ), where δ i = Pr x [T queries x i o iput x]. Var x [ f ] = E x [ f (x) 2 ] (E[ f (x)]) 2 = 1 E x [ f (x)] 2 Note: If f is balaced (i.e., if Pr[ f (x) = 1] = Pr[ f (x) = 1]), the Var[ f ] = 1. For example, PARITY fuctio is balaced, AND is ot. We will prove Theorem 5.0.35 assumig f is balaced; we will fix this later. Recall that Lemma1 makes o assumptios about the fuctio f. If f is ivariat uder a trasitive group, the I i ( f ) = I ( f ), so from Lemma1 we get: Var[ f ] I ( f ) δ i i=1 = I ( f ) E[# variables queried] = I ( f ) D uiform( f )[by pickig optimal determiistic decisio tree for uiform distributio]. 18

LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS Lemma2 [O Doell-Servedio]: For ay mootoe f : { 1, 1} { 1, 1}, I ( f ) D uif ( f ). From Lemma1 ad Lemma2, we get that if f is mootoe, trasitive, ad balaced, the: So ow we will prove lemmas 1 ad 2. Proof of Lemma1: 1 I ( f ) D uif( f ) D uif( f ) 3/2 R( f ) D uif ( f ) 2/3 Var[ f ] δ i I i ( f ) i=1 Var[ f ] = 1 E[ f (x)] 2 = 1 E x [ f (x)].e y [ f (y)] = 1 E x,y [ f (x) f (y)] = E x,y [1 f (x) f (y)] = E x,y [ f (x) f (y) ] That is, for ay sequece of radom variables {0, 1}, x = u [0], u [1],..., u [d] = y: Var[ f ] d 1 i=0 E[ f (u[i] ) f (u [i+1] ) ] (by triagle iequality) For example: Start: f (x 1, x 2, x 3, x 4, x 5 ) [= x = u [0] ] f (x 1, y 2, x 3, x 4, x 5 ) [tree T read x 2 i the above step] f (x 1, y 2, x 3, y 4, x 5 ) [tree T read x 4 above] Ad ow suppose the tree has reached a leaf so it outputs a value of f ad stops. Whe the tree stops, replace the remaiig x s with y s. So we get f (y 1, y 2, y 3, y 4, y 5 ) Note that goig from u [i] to u [i+1] is just re-radomizig, there is o coditioig ivolved because the variable which was queried (whose value got kow) got replaced by a radom variable. Expressio i summatio above is just E u [ f (u) f (u ( j) ) ] where x j is the variable queried by T at this step. Note that this looks a lot like the defiitio of ifluece, without the 1/2. E u [ f (u) f (u ( j) ) ]: read this as take a radom variable u ad re-radomize u by replacig the j th variable by tossig a coi agai. Sice with probability 1/2, the value of the j th variable got flipped or remaied the same, we get: E u [ f (u) f (u ( j) ) ] = 1 2 E u[ f (u) f (u ( j) ) ] = I j (u) So, if x j1, x j2,..., x jd is the radom sequece of variables read (ote that d is also radom), the: Var[ f ] (I j1 ( f ) + I j2 ( f ) +... + I jd ( f )) Pr[j 1, j 2,..., j d occurs] = I i ( f ) Pr[T queries x i ] i=1 Proof of Lemma2: Recall that for mootoe f, I i ( f ) = 2 1 E x[d i f (x)], which is also the coefficiet of x i i the represetig polyomial. Let T be a determiistic decisio tree for f ad suppose T outputs ±1 values. Let L = {leaves of T } ad L + = {leaves of T that output 1}, i.e., acceptig leaves. For each leaf λ L, we have a polyomial p λ (x 1, x 2,..., x ) such that { 1, if (x1, x p λ (x 1, x 2,..., x ) = 2,..., x ) reaches λ 0, otherwise 19

LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS deg(p λ ) = depth of λ. coefficiet of x i i p λ =. 2 depth(λ) 1 { +1, if upo readig xi o path to λ, we brach right Sig of coefficiet = 1, otherwise Notatios: d(λ) = depth(λ) s(λ) = skew of λ Usig the above defiitios, we have: 1 = (#left braches #right braches) o path to λ I (p λ ) = (coefficiet of x i i p λ ) i=1 = s(λ) 2 d(λ) 1 Observe that represetig polyomial of f = 1 2 λ L + p λ. Therefore, I ( f ) = (coefficiet of x i i represetig polyomial of f) i=1 = λ L + λ L s(λ) 2 d(λ) s(λ) 2 d(λ) = E radom brachig [ s(λ) ] I ( f ) λ d(λ) 2 d(λ) = D ui f ( f ). The first iequality is due to drukard s walk i probability theory. Lecture dated 04/25/2008 follows: Today we will prove somethig for graph properties, which will sometimes be stroger tha what we proved last time: 1 E[ f (x)] 2 = V ar[ f (x)] i=1 δi T I i ( f ) I ( f ) D ui f ( f ) D ui f ( f ) 3/2. The first iequality is due to Lemma1 of last time, the secod iequality is due to symmetry, ad the last iequality is due to mootoicity ad Lemma2. If f is balaced, we get D ui f ( f ) 2/3. I the proof we did last time, we did ot use the full power of uiform distributio; we just used the fact that idividual poits are idepedet. If f is ot balaced, we get D ui f ( f ) (some value close to 0), which is ot a good lower boud. For geeral f, istead of assumig X ui f, we use X µ p, where µ p meas choose each x i = 1 with probability p / 1 with probability 1 p idepedetly. µ p (x 1, x 2,..., x ) = p N1(x) (1 p) N 1(x) Now, we will eed to have the followig modificatios: p(1 p) Dµp ( f ) 3/2 1 E[ f (x)] 2 = V ar[ f (x)] i=1 δi,p T I i,p( f ) I ( f ) D µp ( f ), where we get the last iequality due to geeralized drukard walk. If p = 0, E[ f (x)] = 1; if p = 1, E[ f (x)] = 1. So, itermediate value theorem tells us that p such that E[ f (x)] = 0. So we pick this p ad we get a boud for D µp 2/3. Although this boud is for D µp, Yao s miimax lemma tells us that for ay distributio, D µp is a lower boud for R( f ). This probability p where E[ f (x)] = 0 is called critical probability of f. Every mootoe fuctio has a well-defied critical probability. Today s theorem will be i terms of this critical probability. 20

LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS Theorem 5.0.39. If f is a o-costat, mootoe graph property o -vertex graphs, the R( f ) = (mi{ p, 2 log }), where p is the critical probability of f. Note that we ca always assume p 1 2. Why? Because if the critical probability of f > 1/2, we ca always cosider g( x ) = 1 f (1 x ), the p (g) = 1 p ( f ). Critical probability p is the probability that makes E[ f (x)] = 1/2. Let us ow defie the key graph theoretic property graph packig that we eed for the proof. Graph packig is the basis for a lot of lower bouds for graph properties. Defiitio 5.0.40 (Graph Packig). Give graphs G, H with V (G) = V (H), we say that G ad H pack if bijectio φ : V (G) V (H) such that {u, v} E(G), {φ(u), φ(v)}otie(h). φ is called a packig. We state the followig theorem, but we will ot prove it: Theorem 5.0.41 (Sauer-Specer theorem). If V (G) = V (H) = ad (G) (H) 2, the G ad H pack. Here (G) = max degree i G. Note: you ca ever pack a 0-certificate ad a 1-certificate. What is the graph of a 0-certificate? It is a buch of thigs that are ot edges i the iput graph. Ituitio: Sauer-Specer theorem says that if two graphs are small, you ca pack them. Sice we caot pack a 0-certificate ad a 1-certificate, it implies that either of the two has to be big so that certificate complexity is big which gives you a hadle o the lower boud. Cosider a decisio tree; pick a radom iput x {0, 1} accordig to µ p ad ru the decisio tree o x. Let Y = #1 s read i the iput, ad Z = #0 s read i the iput. The, cost of the tree o radom iput = Y + Z. D µp ( f ) = E[cost] = E[Y ] + E[Z] By Yao s miimax lemma, it is eough to prove that either E[Y ] or E[Z] is (mi{ p, log 2 }). Note that Y ad Z are o-egative radom variables (sice they represet the umber of 0 s ad 1 s read), so it is eough to prove the lower boud o oe of them. Proof outlie (remember we ca assume p 1/2): Claim 1: E[Z] E[Y ] = 1 p p. if E[Y ] 64 the: E[Z] = 1 p p 64 128p = (/p) (mi{ p, 2 log }) we may assume E[Y ] 64. E[Y f (x) = 1 (G x ) p+ E[Y ] 4p log ], where G x = graph represeted Pr[ f (x)=1 (G x ) p+ 4p log ] by 1 s i the iput x (so other edges are off ). Theorem 5.0.42. E[X c] E[X] Pr[c] Proof. E[X] = E[X c] Pr[c] + E[X c] Pr[ c] E[X c] Pr[c] 21

LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS Pr[A B] = 1 Pr[A B] 1 Pr[A] Pr[B] = Pr[A] Pr[B] Usig our assumptio, ad the iequality above, we get: E[Y f (x) = 1 (G x ) p + E[Y ] 4p log ] Pr[ f (x)=1...] /64 1 /32 + o() 2 1 Later: The costat 4 above is chose so that Pr[ (G x ) p + 4p log ] 1 1 (Cheroff boud) This implies that there exists a iput, x, satisfyig f (x) = 1 ad (G x ) p + 4p log, such that y /32 + o(). Therefore, there exists a 1-certificate with p + 4p log ad #edges /32 + o(). This meas that we are touchig at most /16 vertices, ad so at least /2 vertices are isolated. Claim 2: All 0-certificates must have at least 2 edges. 16(p+ 4p log ) Now we have a lower boud o the umber of edges of ay 0-certificate. As we did E[Y ], we ca say E[Z] E[Z f (x) = 0] Pr[ f (x) = 0] 2 16(p + 4p log ) 1 2 = 2 32(p + 4p log ) { } 2 mi 64p, 2 256 log { } mi 64p, 2 256 log Claim 2 Proof: This proof amouts to showig that if the claim were false, we could pack a 1-certificate ad 0-certificate. We will show that if (G) k ad G has more that /2 isolated vertices ad E(H) 16 (G) 2, the G ad Hpack. Number of edges i a graph equals half the sum of the degrees of the vertices, i.e., E(H) = 1 2 deg(v) v H = d avg(h) 2 d avg (H) = 2 E(H) = 8 (G) If H is a subgraph of H spaed by /2 lowest degree vertices, the (H ) 2 d avg (H).. Now, if you have a upper boud o the average, you ca boud the max of /2 smallest items. 22

LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS By packig H H with the isolated vertices of G, we ca exted a packig of H ad G to oe oe H ad G. We ca get a packig of H ad G because, (G ) (H ) (G) (H ) Apply Sauer-Specer theorem. Claim 1 Proof: Defie a buch of idicator variables. Let, 4 = /2 2. Y i = Z i = { 1, if xi is read as a 1. 0, otherwise { 1, if xi is read as a 0. 0, otherwise The, ( 2) ( 2) Y = Y i ; Z = Z i i=1 i=1 The it is eough to show that: i : { either E[Yi ] = E[Z i ] = 0 or E[Z i ] E[Y i ] = 1 p p Eough to show the same uder a arbitrary coditioig o x j ( j i) because x i s are idepedet. Apply a arbitrary coditioig of x j ( j i). If this coditioig implies that the i-th coordiate is ot read, it implies Y i = Z i = 0. Otherwise this is the oly coordiate that is left to be read; ad we kow Pr[1] = p ad Pr[0] = 1 p, i.e., E[Z i ] E[Y i ] = Pr[X i = 0] Pr[X i = 1] = 1 p. p What do you mea by eough to show uder arbitrary coditioig? If you have coditioig o rest of N 1 variables the we have C 1, C 2,..., C 2 N 1 coditios. This implies, E[Z i ] = E[Z i C 1 ] Pr[C 1 ] + E[Z i C 2 ] Pr[C 2 ] +... E[Y i ] = E[Y i C 1 ] Pr[C 1 ] + E[Y i C 2 ] Pr[C 2 ] +... We used a property of decisio trees that whether or ot you read ith variable is idepedet of the value of the ith variable; it may deped o what the values of other variables are. 23

Lecture 6 Liear ad Algebraic Decisio Trees Scribe: Umag Bhaskar 6.1 Models for the Elemet Distictess Problem Defiitio 6.1.1 (The Elemet Distictess Problem). Give items (x 1, x 2,, x ) from a uiverse U ot ecessarily ordered, does i j [] : x i x j? The complexity of the problem depeds o the model of computatio used. Model 0. Equality testig oly. This is a very weak model with o o-trivial solutios. The elemet distictess problem would require ( 2 ) comparisos. Model 1. Assume uiverse U is totally ordered. We ca compare ay two elemets x i ad x j ad take either of 3 braches depedig o whether x i > x j, x i = x j, or x i > x j. Note that i a computer sciece cotext this makes sese, as it is always possible to compare two objects by comparig their biary represetatios. I this model, the problem reduces to sortig, hece the elemet distictess problem ca be solved i O( log ) comparisos. I fact we have a matchig lower boud ( log ) for this problem, through a complicated adversarial argumet. Istead of describig the adversarial argumet we describe a stroger model. We the use this model to give us a lower boud, sice a lower boud for the problem i the stroger model is also a lower boud i this model. Model 2. We assume w.l.o.g. that the uiverse U = R with the usual orderig. The the compariso of two elemets x i ad x j is equivalet to decidig if x i x j is <, > or = 0. For this model we allow more geeral tests, such as is x 1 2x 3 + 4x 5 x 7 > 0, = 0, < 0 I geeral at a iteral ode v i the decisio tree, we ca decide whether p v (x 1, x 2,, x ) brach accordigly. Here p v (x 1, x 2,, x ) = a (v) 0 + 24 i=1 a (v) i x i.? > 0 = 0 < 0, ad

LECTURE 6. LINEAR AND ALGEBRAIC DECISION TREES 6.2 Liear Decisio Trees For model 2 as described above, we have the followig defiitio. Defiitio 6.2.1 (Liear Decisio Trees). A liear decisio tree is a 3-ary tree such that each iteral ode v ca > 0 decide whether p v (x 1, x 2,, x ) = 0, ad brach accordigly. The depth of such a tree is defied as usual. < 0 If we assume that our tree tests membership i a set, we ca cosider the output to be {0, 1}. The we ca thik of two subsets W 0, W 1 R where: 1. W i = { x R : tree outputs i} 2. W 0 W1 = 3. W 0 W1 = R Cosider a leaf λ of the tree. The the set S λ = {x R : x reaches λ} is described by a set of liear equality / iequality costraits, correspodig to the odes o the path from the root to the leaf. The S λ is a covex polytope, sice each liear equality / iequality defies a covex set, ad the itersectio of covex sets is itself a covex set. Covex sets are coected, i the followig sese: Defiitio 6.2.2 (Coected Sets). A set S R is said to be (path-) coected if a, b S, a fuctio p ab : [0, 1] S such that: p ab is cotiuous p ab (0) = a p ab (1) = b Suppose a liear decisio tree T has L 1 leaves that output 1 ad L 0 leaves that 0. The W 1 = λ outputs 1 For a set S, defie #S to be the umber of coected compoets of S. The, sice each leaf outputs at most a sigle coected compoet, S λ ad similarly, #W 1 L 1 #W 0 L 0 For a give fuctio f : R {0, 1} (for example, elemet distictess) we defie f 1 (1) = W 1 = { x : f (x) = 1}, ad f 1 (0) is similarly defied. The, ad, L 1 # f 1 (1) L 0 # f 1 (0) This implies for ay decisio tree that computes f, the umber of leaves L satisfies L = L 1 + L 0 # f 1 (1) + # f 1 (0) 25

LECTURE 6. LINEAR AND ALGEBRAIC DECISION TREES ad hece, height(t ) log 3 (# f 1 (1) + # f 1 (0)) We ow retur to the elemet distictess problem. Remember that { 1, if i j, xi x f (x 1, x 2,, x ) = j 0, o.w. It turs out that f 1 (1) has at least! coected compoets. Actually the correct umber is!, but we are oly iterested i the lower boud which we will prove. Lemma 6.2.3. σ, π S, the group of all permutatios, σ π, the poits x σ = (σ (1), σ (2),..., σ ()) ad x π = (π(1), π(2),..., π()) lie i distict coected compoets of f 1 (1). Proof. The proof is by cotradictio. Suppose ot, the there exists a path from x σ to x π lyig etirely iside f 1 (1). The, for permutatios σ, π, i j s.t. σ (i) < σ ( j) π(i) > π( j) Now x (σ ) ad x (π) lie o opposite sides of the hyperplae x i = x j. Hece ay path from x (σ ) to x (π) must cotai a poit x s.t. xi = x j, by the Itermediate Value Theorem ad f (x ) = 0, which gives us the required cotradictio. Suppose a liear decisio tree T solves elemet distictess. The D(elemet distictess) height(t ) log 3 (# f 1 (0) + # f 1 (1)) log 3 (!) = ( log ) Note that very little of this proof actually uses the liearity of the odes. We ca as well use other fuctios e.g. quadratic fuctios ad get the same results. 6.3 Algebraic Decisio Trees A algebraic decisio tree of order d is similar to a liear decisio tree, except the iteral odes evaluate polyomials of degree d, ad brach accordigly. The problem ow is that the compoets may o loger be coected, e.g. (x + y)(x y) > 0. However, it turs out that low degree polyomials are t too bad, as evideced by the followig theorem: Theorem 6.3.1 (Milor-Thom). Let p 1 (x 1, x 2,..., x m ), p 2 (x 1, x 2,..., x m ),..., p t (x 1, x 2,..., x m ) be polyomials with real coefficiets of degree k each. Let V = { x R m : p 1 ( x) = p 2 ( x) = = p t ( x) = 0} (called a algebraic variety. The the umber of coected compoets i V = #V k(2k 1) m 1. Milor cojectured that the lower boud for the umber of coected compoets i V should be k m. As a example, cosider the equalities p 1 (x) = (x 1 1)(x 1 2)... (x 1 k) = 0 p 2 (x) = (x 2 1)(x 2 2)... (x 2 k) = 0... p m (x) = (x m 1)(x m 2)... (x m k) = 0 The p 1 (x) = p 2 (x) = = p m (x) is satisfied at exactly k m poits, each of which forms a coected compoet. It follows from the theorem that each leaf gives at most k(2k 1) m 1 compoets. However, iteral odes are ot restricted to equalities ad ca evaluate iequalities. I the ext lecture we itroduce Be-Or s lemma, which hadles this case. We use this to get a lower boud for tree depth i such a model, i.e. where the iteral odes are allowed to perform algebraic computatios ad compute iequalities. 26

Lecture 7 Algebraic Computatio Trees ad Be-Or s Theorem Scribe: Umag Bhaskar 7.1 Itroductio Last time we saw the Milor-Thom theorem, which says that the followig Theorem 7.1.1 (Milor-Thom). Let p 1 (x 1, x 2,..., x m ), p 2 (x 1, x 2,..., x m ),..., p t (x 1, x 2,..., x m ) be polyomials with real coefficiets of degree k each. Let W = { x R m : p 1 ( x) = p 2 ( x) = = p t ( x) = 0}. The the umber of coected compoets i W = #W k(2k 1) (m 1). If istead of costat-degree polyomials we allowed arbitrary degree polyomials the it turs out that the elemet distictess problem could be doe i O(1) time! Cosider the expressio x = i< j (x i x j ) The x = 0 i j : x i = x j. Hece, this is obviously too powerful a model. I geeral such problems ca be posed as set recogitio problems, e.g. the elemet distictess ca be posed as recogisig the set W where W = { x R m : i, j (i j x i x j )}. Defie D li (W ) as the height of the shortest liear decisio tree that recogizes W. We saw i the last lecture that for this particular problem, D li (W ) log 3 (#W ). 7.2 Algebraic Computatio Trees Defiitio 7.2.1 (Algebraic Computatio Trees). A algebraic computatio tree (abbreviated ACT ) is oe with two types of iteral odes: 1. Brachig Nodes: labelled v:0 where v is a acestor of the curret ode 2. Computatio Nodes: labelled v op v where v, v are (a) acestors of the curret ode, (b) iput variables, or (c) costats 27