Malicious Omissions and Errors in Answers to Membership Queries

Transcription

1 Machine Learnin, 28, (1997) c 1997 Kluwer Academic Publishers. Manuactured in The Netherlands. Malicious Omissions and Errors in Answers to Membership Queries DANA ANGLUIN anluin@cs.yale.edu MĀRTIŅŠ KRIĶIS krikis@cs.yale.edu Department o Computer Science, Yale University, P.O. Box , New Haven, CT ROBERT H. SLOAN sloan@eecs.uic.edu Dept. o Electrical En. and Computer Science, 851 S. Moran St. Rm 1120, University o Illinois at Chicao, Chicao, IL GYÖRGY TURÁN u11557@uicvm.uic.edu Dept. o Math., Stat., and Comp. Sci., 851 S. Moran St. Rm 322, University o Illinois at Chicao, Chicao, IL 60607, Automata Theory Research Group Hunarian Academy o Sciences, Szeed Editor: David Haussler Abstract. We consider two issues in polynomial-time exact learnin o concepts usin membership and equivalence queries: (1) errors or omissions in answers to membership queries, and (2) learnin inite variants o concepts drawn rom a learnable class. To study (1), we introduce two new kinds o membership queries: limited membership queries and malicious membership queries. Each is allowed to ive incorrect responses on a maliciously chosen set o strins in the domain. Instead o answerin correctly about a strin, a limited membership query may ive a special I don t know answer, while a malicious membership query may ive the wron answer. A new parameter L is used to bound the lenth o an encodin o the set o strins that receive such incorrect answers. Equivalence queries are answered correctly, and learnin alorithms are allowed time polynomial in the usual parameters and L. Any class o concepts learnable in polynomial time usin equivalence and malicious membership queries is learnable in polynomial time usin equivalence and limited membership queries; the converse is an open problem. For the classes o monotone monomials and monotone k-term DNF ormulas, we present polynomial-time learnin alorithms usin limited membership queries alone. We present polynomial-time learnin alorithms or the class o monotone DNF ormulas usin equivalence and limited membership queries, and usin equivalence and malicious membership queries. To study (2), we consider classes o concepts that are polynomially closed under inite exceptions and a natural operation to add exception tables to a class o concepts. Applyin this operation, we obtain the class o monotone DNF ormulas with inite exceptions. We ive a polynomial-time alorithm to learn the class o monotone DNF ormulas with inite exceptions usin equivalence and membership queries. We also ive a eneral transormation showin that any class o concepts that is polynomially closed under inite exceptions and is learnable in polynomial time usin standard membership and equivalence queries is also polynomial-time learnable usin malicious membership and equivalence queries. Corollaries include the polynomial-time learnability o the ollowin classes usin malicious membership and equivalence queries: deterministic inite acceptors, boolean decision trees, and monotone DNF ormulas with inite exceptions. Keywords: Concept learnin, queries, errors 1. Introduction There is an impressive and rowin number o polynomial-time alorithms, many o them quite beautiul and inenious, to learn various interestin classes o concepts usin equiva-

2 212 D. ANGLUIN, ET AL. lence and membership queries. To apply such alorithms in practice, researchers need to overcome a number o problems. One siniicant issue is the problem o omissions and errors in answers to queries. Previous learnin alorithms in the equivalence and membership query model are uaranteed to perorm well assumin that queries are answered correctly, but there is oten no uarantee that the perormance o the alorithm will derade raceully i that assumption is not exactly satisied. Lan and Baum (1992) report that this problem derailed their attempt to apply Baum s alorithm or learnin neural nets rom examples and membership queries (Baum, 1991) to the problem o reconizin hand-written diits. The attempt ailed because the membership questions posed by the alorithm were too diicult or people to answer reliably. The alorithm typically asked membership queries on, say, a random-lookin blur midway between a 5 and a 7, and the humans bein queried ave very inconsistent responses. Studies in conitive psycholoy indicate that this is the norm; people are typically quite inconsistent in decidin where the precise boundary o a concept lies. (See, e.., Anderson (1980).) 1.1. Omissions and Limited Membership Queries This motivated us to introduce the limited membership query. A limited membership query may be answered either correctly, or with an omission, that is, a special value siniyin I don t know. The answers are persistent; that is, repeated queries about the same example are iven the same answer. The choice o the set o strins on which to ive answers o I don t know is assumed to be made by a malicious adversary. We introduce a new parameter L to quantiy the amount o omission it is a bound on the table-size o the set o strins on which the adversary answers I don t know. (The table-size o a set o strins is the number o strins in the set plus the sum o their lenths.) A polynomial-time learnin alorithm is permitted time polynomial in the usual parameters and L. For this model, we deine a hypothesis to be nonstrictly correct i it arees with the taret concept on all examples except possibly ones or which a limited membership query was answered I don t know. Thus, domain elements answered with I don t know are allowed to be classiied arbitrarily by the inal hypothesis o a learnin alorithm. This corresponds to the intuition that since a person could not be sure o the classiication o a blur between 5 and 7, it does not matter how the inal hypothesis classiies it. We ive a polynomial-time learnin alorithm usin just limited membership queries or the class o monotone monomials and a lower bound on the query complexity o this problem. We also ive a polynomial-time learnin alorithm in this model or the class o k-term monotone DNF ormulas. We also consider combinin limited membership queries with equivalence queries. We assume that the answers to equivalence queries remain correct, that is, any counterexample iven is truly a counterexample to the hypothesis o the learnin alorithm. In the nonstrict model, the equivalence query is answered yes i the hypothesis is nonstrictly correct; otherwise a counterexample must be returned rom amon examples not answered with I don t know. In the strict model, equivalence queries remain as usual, that is, an equivalence query is answered with yes i the hypothesis is exactly equivalent to the taret concept;

3 MALICIOUS OMISSIONS AND ERRORS 213 otherwise an arbitrary counterexample is returned (includin possibly an example previously answered with I don t know. ) We show that the same classes o concepts can be learned in polynomial time usin limited membership queries and equivalence queries in the nonstrict and strict models. We ive a polynomial-time alorithm to learn monotone DNF ormulas usin limited membership queries and equivalence queries in the nonstrict model Malicious Membership Queries and Errors In the case o limited membership queries, the answers that are not omissions are uaranteed to be correct. We also consider the situation in which the answers to membership queries may be wron. In a malicious membership query, the answer iven may be correct, or it may be an error. As in the case o limited membership queries, we use the parameter L to bound the table-size o the set o strins whose malicious membership queries are answered erroneously, the choice o that set o strins is assumed to be made by a malicious adversary, and the answers to queries are persistent. We assume that equivalence queries remain correct, which corresponds to the strict model introduced above. That is, the inal hypothesis o the learnin alorithm must be exactly equivalent to the taret concept. We ive a polynomial-time alorithm to learn monotone DNF ormulas usin malicious membership queries and equivalence queries. It is not diicult to see that any concept class learnable in polynomial time usin malicious membership queries and equivalence queries is learnable in polynomial time usin limited membership queries and equivalence queries, but the converse direction is an interestin open question. We exhibit a class o concepts or which the query complexity or equivalence and malicious membership queries is not bounded by any polynomial in the query complexity or equivalence and limited membership queries, but both complexities are exponential in the number o strins that receive incorrect answers Finite Exceptions A related issue is the assumption that the taret concept is drawn rom a particular class o concepts, or example, monotone DNF ormulas. Even i the taret concept is nearly a monotone DNF ormula, there is typically no uarantee that the learnin alorithm will do anythin reasonable. We approach this question by considerin inite variants o the concepts in a iven class, usin the table-size o the set o exceptions as a measure o how dierent the taret concept is rom one in the speciied class. We deine what it means or a concept class to be polynomially closed under inite exceptions. Some concept classes, or example, DFA s and decision trees, are polynomially closed under inite exceptions, while others, like monotone DNF ormulas, are not. For the latter, we deine a natural operation o addin exception tables to the concept class to make it polynomially closed under exceptions. We ive a polynomial-time learnin alorithm or the resultin class o monotone DNF ormulas with inite exceptions, usin equivalence queries and standard membership queries. We then ive a eneral transormation that shows that any class o concepts that is polynomially closed under exceptions and polynomial-time learnable usin equivalence que-

4 214 D. ANGLUIN, ET AL. ries and standard membership queries is also polynomial-time learnable usin equivalence queries and malicious membership queries. Corollaries include polynomial-time learnin alorithms usin equivalence queries and malicious membership queries or the concept classes o DFA s, decision trees, and monotone DNF ormulas with inite exceptions. The notion o a inite variant o a concept, that is, a concept with a inite set o exceptions, is a uniyin theme between the models o learnin with equivalence and malicious membership queries and o learnin a concept class with inite exceptions. Our model o errors in membership queries can be viewed as combinin an equivalence oracle or the taret concept and a membership oracle or a inite variant o the taret concept. In the case o learnin a concept class with inite exceptions, the equivalence and membership oracles present the same inite variant o a concept in the base class. In both cases, the oal is to identiy exactly the concept presented by the equivalence oracle Related Work There is a considerable body o literature on errors in examples in the PAC model, startin with the irst error-tolerant alorithm in the PAC model, iven by Valiant (1985). In this case the oal is PAC-identiication o the taret concept, despite the corruption o the examples by one or another kind o error, or example, random or malicious misclassiication errors, random or malicious attribute errors, or malicious errors (in which both attributes and classiication may be arbitrarily chaned). There has been not as much work on omissions and errors in learnin models in which membership queries are available, and the issues are not as well understood. One relevant distinction is whether the omissions o errors in answers to membership queries are persistent or not. They are persistent i repeated queries to the same domain element always return the same answer. In eneral, the case o persistent omissions or errors is more diicult, since non-persistent omissions or errors can yield extra inormation, and can always be made persistent simply by cachin and usin the irst answer or each domain point queried Non-persistent Errors in Queries Sakakibara deines one model o non-persistent errors, in which each answer to a query may be wron with some probability, and repeated queries constitute independent events (Sakakibara, 1991). He ives a eneral technique o repeatin each query suiciently oten to establish the correct answer with hih probability. This yields a uniorm transormation o existin query alorithms. The method also works or both o Bultman s models (Bultman, 1991). This could be a reasonable model o a situation in which the answers to queries were bein transmitted throuh a medium subject to random independent errors; then the technique o repeatin the query is eminently sensible. A related model is considered by Dean et al. (1995) or the case o a robot learnin a inite-state map o its environment usin aulty sensors and reliable eectors. This model assumes that observation errors are independent as lon as there is a nonempty action sequence separatin the observations. This means that there is no simple way to repeat the same query, since a nonempty action sequence may take the robot to another state, and no reset operation is available. A polynomial-time learnin alorithm is iven or the

5 MALICIOUS OMISSIONS AND ERRORS 215 situation in which the environment has a known distinuishin sequence. It achieves exact identiication with hih probability Persistent Errors in Membership Queries The method o repeatin the query is insuicient or the more diicult case o persistent omissions or errors in membership queries. In this case, we must exploit the errorcorrectin properties o roups o related queries. In an explicit and very interestin application o the ideas o error-correctin alorithms, Ron and Rubineld use the criterion o PAC-identiication with respect to the uniorm distribution, and ive a polynomial-time randomized alorithm usin membership queries to learn DFA s with hih rates o random persistent errors in the answers to the membership queries (Ron & Rubineld, 1995). Alorithms that use membership queries to estimate probabilities (in the spirit o the statistical queries deined by Kearns (1993)) are enerally not too sensitive to small rates o random persistent errors in the answers to queries. For example, Goldman, Kearns, and Schapire ive polynomial-time alorithms or exactly learnin read-once majority ormulas and read-once positive NAND ormulas o depth O(lo n) with hih probability usin membership queries with hih rates o persistent random noise or modest rates o persistent malicious noise (Goldman, Kearns & Schapire, 1993). As another example, Kushilevitz and Mansour s alorithm that uses membership queries and exactly learns loarithmicdepth decision trees with hih probability in polynomial time seems likely to be robust under nontrivial rates o persistent random noise in the answers to queries (Kushilevitz & Mansour, 1993). However, learnin alorithms or other classes o concepts usin equivalence and membership queries may depend more stronly on the correctness o the answers to individual queries; in these cases, there is no uarantee o a learnin alorithm or the class that can tolerate omissions or errors in the answers to membership queries. One model that addressed these questions was introduced by Anluin and Slonim: equivalence queries are assumed to be answered correctly, while membership queries are either answered correctly or with I don t know and the answers are persistent. The I don t know answers are determined by independent coin lips the irst time each query is made (Anluin & Slonim, 1994). They ive a polynomial-time alorithm to learn monotone DNF ormulas with hih probability in this settin. They also show that a variant o this alorithm can deal with one-sided errors, assumin that no neative point is classiied as positive. Goldman and Mathias also consider this model (Goldman & Mathias, 1992). Our current models and results dier in that the omissions and errors are chosen by a malicious adversary instead o a random process, and the rate o incorrect answers that can be tolerated is consequently much lower. Frazier et al. (1994) have introduced a model o omissions in answers to membership queries, called learnin rom a consistently inorant teacher. The basic idea is to require that i the teacher ives answers to certain queries that would imply a particular answer to another query, the teacher cannot answer the latter query with I don t know. For example, in the domain o monotone DNF ormulas, i the teacher classiies a particular point as positive, then the teacher cannot answer I don t know about any o the ancestors o the point. The oal o the learner is to learn exactly the ternary classiication o points into

6 216 D. ANGLUIN, ET AL. positive, neative, and I don t know that is presented by the teacher. Such a ternary classiication may be represented by the areement o a set o binary-valued concepts; the areement classiies a point as positive (respectively, neative) i all the concepts in the set classiy it as positive (respectively, neative), otherwise, the areement classiies the point as I don t know. Eicient learnin alorithms are iven in this model or monomials with at least one positive example, concepts represented as the areement o a constant number o monotone DNF ormulas, k-term DNF ormulas, DFA s, or decision trees, and concepts represented by an areement o boxes with samplable intersection. Compared to our model, this model has a dierent measure o the representational complexity o a concept with omissions, which allows a much hiher rate o omissions to be compactly represented. It also diers in requirin the learner to reproduce exactly the I don t know labels o the teacher, whereas in our (nonstrict) model o omissions such examples can be classiied arbitrarily. 2. Preliminaries 2.1. Concepts and Concept Classes Our deinitions or concepts and concept classes are a bit non-standard. We have explicitly introduced the domains o concepts in order to try to uniy the treatment o ixed-lenth and variable-lenth domains. We take Σ and Γ to be two inite alphabets. Examples are represented by inite strins over Σ and concepts are represented by inite strins over Γ. A concept consists o a pair (X, ), where X Σ, and maps X to {0, 1}. The set X is the domain o the concept. The positive examples o (X, ) are those w X such that (w) =1, and the neative examples o (X, ) are those w X such that (w) =0. Note that strins not in the domain o the concept are neither positive nor neative examples o it. A concept class is a triple (R, Dom,µ), where R is a subset o Γ, Dom is a map rom R to subsets o Σ, and or each r R, µ(r) is a unction rom Dom(r) to {0, 1}. R is the set o leal representations o concepts. For each r R, the concept represented by r is (Dom(r),µ(r)). A concept (X, ) is represented by a concept class (R, Dom,µ)i and only i or some r R, (X, ) is the concept represented by r. The size o a concept (X, ) represented by (R, Dom,µ)is deined to be the lenth o the shortest strin r R such that r represents (X, ). The size o (X, ) is denoted by (X, ) ; note that it depends on the concept class chosen. The concept classes we consider in this paper are boolean ormulas and syntactically restricted subclasses o them, boolean decision trees, and DFA s. The representations are more or less standard, except each concept representation speciies the relevant domain. For DFA s, the domain o every concept is the set Σ itsel. For boolean ormulas and decision trees, we assume that Σ={0,1}, and each concept representation speciies a domain o the orm {0, 1} n. For each inite set S o strins rom Σ, we deine its table-size, denoted S, as the sum o the lenths o the strins in S and the number o strins in S. Note that S =0i and

7 MALICIOUS OMISSIONS AND ERRORS 217 only i S =. The table-size o a set o strins is related in a straihtorward way to an encodin o a list o the strins; see Section Queries For a learnin problem we assume that there is an unknown taret concept r drawn rom a known concept class (R, Dom,µ). Inormation about the taret concept is available to the learnin alorithm as the answers to two types o queries: equivalence queries and membership queries. In an equivalence query, the alorithm ives as input a concept r R with the same domain as the taret, and the answer depends on whether µ(r) =µ(r ). I so, the answer is yes, and the learnin alorithm has succeeded in its oal o exact identiication o the taret concept. Otherwise, the answer is a counterexample, any strin w Dom(r) on which the unctions µ(r) and µ(r ) dier. We denote an equivalence query on a hypothesis h by EQ(h). The label or a counterexample v = EQ(r ) is the value o µ(r) on v, ivin its classiication by the taret concept. Since the hypothesized concept r and the taret concept r dier on the classiication o the counterexample v, the label o v is also the complement o the value o µ(r ) on v. Positive counterexamples are those with label 1 and neative counterexamples are those with label 0. In a membership query, the learnin alorithm ives as input a strin w Dom(r), and the answer is either 0, 1, or. I the answer is equal to the value o µ(r) on w, then the answer is correct. I the answer is equal to, we say that the answer is an omission or a Don t know. I the answer is 0 or 1 but not equal to the value o µ(r) on w, then the answer is an error or a lie. In a standard membership query, denoted MQ, all the answers are required to be correct. In a limited membership query, denoted LMQ, each answer is required to be correct or an omission. In a malicious membership query, denoted MMQ, each answer is required to be correct or an error (no omissions). Note that an answer o 0 or 1 to a limited membership query is always correct, but this is not true or answers to malicious membership queries. The answers to malicious and limited membership queries are also restricted as ollows. 1. They are persistent; that is, dierent membership queries with the same input strin w receive the same answer. Note that non-persistent queries may reveal some inormation; in case two dierent queries to the same strin receive dierent answers, the learnin alorithm knows that there has been an error on this strin, thouh this will not in eneral determine the correct classiication o the strin. Every alorithm desined to work with persistent queries can be made to work with non-persistent ones by cachin the queries and always usin the irst answer or subsequent queries o the same strin. 2. In addition, we bound the quantity o errors (or omissions) permitted in answers to malicious (resp., limited) membership queries. One natural quantity to bound would be the number o dierent strins whose membership queries can be answered incorrectly, and this works well in ixed-lenth domains. However, in variable-lenth domains, we wish to account or the lenths o the strins as well as their number.

8 218 D. ANGLUIN, ET AL. Thereore, in eneral the alorithm is iven a bound L on the table-size, S, o the set S o strins whose malicious (resp., limited) membership queries are answered erroneously (resp., with a ) durin a sinle run. In the case o a ixed-lenth domain, {0, 1} n, we may instead ive a bound l on the number o dierent strins whose MMQ s (resp., LMQ s) are answered incorrectly (resp., with a ). Note that L = l(n +1)is a bound on the table-size in this case. Note that when L =0or l =0there can be no errors or omissions in the answers to MMQ s (or LMQ s) and we have the usual model o standard membership queries as a special case. We assume that an on-line adversary controls the choice o counterexamples in answers to equivalence queries and the choice o which elements o the domain will be answered with errors (or s) in malicious (or limited) membership queries. When the learnin alorithms we consider are deterministic, the adversary may be viewed as choosin in advance the set o strins or which it will ive incorrect answers to membership queries, as well as all the counterexamples it will ive to equivalence queries. In this paper we consider models in which the learnin alorithm has access to the ollowin combinations o queries: 1. membership and equivalence queries, 2. limited membership queries alone, 3. limited membership queries and equivalence queries, and 4. malicious membership queries and equivalence queries. Model (1) is just the usual model o a minimally adequate teacher (Anluin, 1987). In model (2), the learnin alorithm need only achieve nonstrict identiication. In other words, the concept output by the learnin alorithm must aree with the taret concept on all points not answered by the LMQ, but it may dier on points answered. This corresponds to our view that points orm the borderline o the concept and that the classiication o them is irrelevant or meaninless. For model (3) we consider both nonstrict and strict variants. In the nonstrict variant, equivalence queries are modiied so that i the queried concept and the taret concept dier only on points classiied as by the LMQ, then the reply is yes. Otherwise, a counterexample must be iven rom the set o points not classiied as by the LMQ. In this case, as in model (2), only nonstrict identiication is required. In the strict variant o model (3), as well as in model (4), equivalence queries are not modiied, and the learnin alorithm is required to achieve the usual kind o exact identiication, that is, the output concept must aree with the taret concept on every point in their common domain. We extend the usual notion o polynomial-time learnin to models (2-4) by allowin the polynomial bound on the runnin time to depend on three parameters, that is, p(s, n, L). Here s is the usual parameter boundin the lenth o the representation o the taret concept, n is the usual parameter boundin the lenth o the lonest counterexample seen so ar, and L is a new parameter, boundin the table-size o the set o strins on which LMQ answers or MMQ ives incorrect answers.

9 MALICIOUS OMISSIONS AND ERRORS 219 The deinitions are extended in the usual way to cover randomized learnin alorithms and their expected runnin times, and also extended equivalence queries, in which the inputs to equivalence queries and the inal result o the alorithm are allowed to come rom a concept class dierent rom (usually larer than) the concept class rom which the taret is drawn. It is straihtorward to transorm any alorithm that uses malicious membership queries into one that uses limited membership queries. Every answer can be replaced by a 0 or a 1 arbitrarily and iven to the learner. Thereore learnin with malicious membership queries is at least as hard as learnin with limited membership queries in the strict model. The same applies to learnin rom equivalence and malicious membership queries and learnin rom equivalence and limited membership queries in the strict model. Furthermore, in Subsection 6.3 we show that the strict and the nonstrict models o learnin rom equivalence and limited membership queries are in act polynomial-time equivalent. Note that the most eneral kind o membership queries is one in which both wron and answers are possible, but such queries are not harder than the malicious ones and thereore we consider only the latter Monotone DNF Formulas We assume a set o propositional variables V and denote its elements by x 1,x 2,...,x n, where n is the cardinality o V. A monotone DNF ormula over V is a DNF ormula over V where no literal is neated. The domain o such a ormula is {0, 1} n. For example, or n =20, x 1 x 4 x 2 x 17 x 3 x 9 x 5 x 12 x 3 x 8 is a monotone DNF ormula (with domain {0, 1} 20 ). We assume that the taret ormula h has been minimized, that is, it contains no redundant terms. (Incidentally, there is an eicient alorithm to minimize the number o terms o a monotone DNF ormula.) For a monotone DNF ormula, let #() denote the number o terms in. In the above example, #() =4. We view the domain {0, 1} n o monotone DNF ormulas (with or without exceptions) as a lattice, with componentwise or and and as the lattice operations. The top element is the vector o all 1 s, and the bottom element is the vector o all 0 s. The elements are partially ordered by, where v w i and only i v[i] w[i] or all 1 i n. Oten we reer to the examples as points o the hypercube {0, 1} n. For a point v, all points w such that w v are called the descendants o v. Those descendants that can be obtained by chanin exactly one coordinate o v roma1toa0arecalled the children o v. The ancestors and the parents are deined similarly. Note that v is both a descendant and ancestor o itsel. For convenience, we use a representation o monotone DNF ormulas in which each term is represented by the minimum vector, in the orderin, that satisies the term. Thus, vector (where n =5) denotes the term x 1 x 4 x 5. In this representation, i h is a monotone DNF ormula and v is a vector in the domain, v satisies h i and only i or some term t o h, t v. That is, a monotone DNF ormula is satisied only by the ancestors o its terms. In the other direction, we say that term t covers point v i and only i v satisies t. For the sake o simplicity we oten use in our alorithms somethin called the empty

10 220 D. ANGLUIN, ET AL. DNF ormula. This is the ormula with no terms, which is not satisied by any point, and is thereore the identically alse ormula. For any n-arument boolean unction, we call point x a local minimum point o i (x) =1but or every child y o x in the lattice, (y) =0. The local minimum points o a minimized DNF ormula represent its terms in our representation. For two n-arument boolean unctions 1 and 2 we deine the set Err( 1, 2 )to be the set o points where they dier. I.e., Err( 1, 2 )={x 1 (x) 2 (x)}. The cardinality o Err( 1, 2 )is called the distance between 1 and 2 and is denoted by d( 1, 2 ). 3. Limited Membership Queries In this section, we present results concernin learnin with the help o limited membership queries. We start with the description o a subroutine that is repeatedly used in all alorithms or this model Delimitin A Term Alorithm Delimit takes a positive point and inds a set o candidates or a term in the taret concept coverin the point. This alorithm plays the role o the alorithms called Reduce in other works on learnin monotone DNF (Anluin & Slonim, 1994). We choose a dierent name because those alorithms output a sinle monomial, whereas Alorithm Delimit inds a set o points that must include a correct term ? ?? ?? Fiure 1. Example run o Alorithm Delimit or taret concept x 1 x 2. Boldace + s, s and?s indicate responses 1, 0, and, respectively, o the LMQ oracle. Consider, or example, the situation in Fiure 1 where the taret concept is x 1 x 2, and we start with the known positive point p = With complete inormation, we would bein by queryin each child o p, updatin p to be the irst positive child ound. This process would be repeated until eventually we had p = Ater determinin by membership queries that every child o p is neative, we could stop.

11 MALICIOUS OMISSIONS AND ERRORS 221 Delimit(p) obits =000 {z } n hroot; DKi= Down(p; obits) hp; DKi = Up(DK) P = P [root T = DK [root Return ht;pi Fiure 2. Alorithm Delimit. obits is a bit array used in recursive subroutine Down to improve eiciency; root is a special positive point (with everythin beneath it bein neative or belonin to DK); DK is a certain set o points with LMQ, created by Down and urther thinned by Up; P is a certain set o positive points above DK, a useul byproduct o Up; T is the set o points amon which the correct term o the taret ormula must lie. Because Alorithm Delimit can make only limited membership queries, it may encounter the diiculty that some positive point p has children that are all Don t know, or a mix o No and Don t know. For instance, in the example in Fiure 1, all queries o children o 1101 return 0 or. In this case, Alorithm Delimit continues by queryin all the children o all the Don t know points. Should it ever et another 1 response to a limited membership query, it replaces p by that point. What we have just described is the subroutine Down o Delimit, which invokes itsel recursively upon indin a new positive point. Detailed code o alorithm Delimit and its subroutines is iven in Fiures 2, 3 and 5. In our C-like pseudocode we oten use a For loop over a chanin set o points; or example, For (each b A). By this we mean that in each iteration o the loop the current point (i.e., b in this example) is marked, and that only unmarked points can be considered in the next iterations. Furthermore, the loop condition is checked every time, that is, we check or an existence o some unmarked point in the possibly chanin set (A in this example). I one exists, we mark it and do the body o the loop which may add new (unmarked) points to the set or delete some existin points (either marked or unmarked). Points that are deleted beore markin are not processed. Furthermore, to ensure that the alorithms are deterministic, we need that the current point is chosen accordin to some unambiuous rule, i.e., there must not be any choice as to which unmarked point o the set will be marked and processed next. Thus, we assume that there is some total order on all the elements in the sample space and use this to unambiuously choose the current point. When the points o the sample space correspond to assinments o 0 and 1 to n variables, the easiest total order is created by treatin each point as an n-bit number. In all our alorithms that learn boolean ormulas, we assume that this total order is used and always pick the point with the lowest number. Another thin worth explanation about our pseudocode is the use o,,..., on the let side o assinments and as return values or subroutines. We do this to avoid conusin lobal variables or thins passed by reerence. Thus, everythin is passed by value (by makin a copy in the called subroutine) and everythin that the callin routine needs is returned to it explicitly. I many thins need to be returned, then we put them in a tuple,

12 222 D. ANGLUIN, ET AL. denoted by,,...,, and return the tuple. The tuple is basically just a convenient notation or a structure. Down(p; obits) DK = C = c : c isachild o p && c obits While (C 6= ) a = maximal element o C with the lowest number Delete a rom C I (LMQ(a) ==1) Return Down(a; obits) Else I (LMQ(a) ==?) DK = DK [a C =C[c:c is a child o a && c obits Else I (a isachild o p) i = bit where a and p dier obits[i] =1 Return hp; DKi Fiure 3. Subroutine Down. p is the point it is called on; obits is a bit array used to improve eiciency; DK is a set o points with LMQ and with all descendants or neative; C is a set o points that have to be processed in the main loop. The subroutine Down uses a variable obits to improve eiciency. I a query to a child a o a known positive point p ives a 0 response, then we know that in any descendant o p, switchin o the bit position that distinuishes a rom p will lead to a neative point, because this point will be a descendant o a. Thereore, variable obits keeps track o those bit positions that must be 1, allowin subroutine Down to save some queries. Eventually Down is called on some positive point p and inds a (possibly empty) set DK o descendants o p such that the limited membership oracle responded or every point in DK, and all other proper descendants o p are known to be neative. Then Down returns and point p rom then on is called the root. Since root is positive, and we know that every descendant o root not in set DK is neative, a term o the monotone DNF must lie somewhere in the set DK {root}. This set is outlined in Fiure 4. The next major part o Delimit is the subroutine Up. Any point correspondin to a term o the taret concept must have only positive ancestors. Subroutine Up ensures that

13 MALICIOUS OMISSIONS AND ERRORS root ? ?? ?? Fiure 4. The set o candidate terms obtained by subroutine Down. Up(DK) P = For (each a 2 DK) A = parents o a For (each b 2 A) I (LMQ(b) ==0) Return hp; DKi Delete all descendants o b rom DK Break /* Out o the For (each b 2 A) loop */ Else I (LMQ(b) ==1) P =P [b Else A = A [parents o b Fiure 5. Subroutine Up. DK is a set o points with LMQ that has to be thinned; P is a set o minimal ancestors with LMQ 1 o points in DK; A is a set o certain ancestors o the current point a. no point in DK has any ancestor with LMQ 0. Thus, Delimit ets a thinned set o possible terms T. In the example o Fiures 1 and 4, 0100 is deleted rom DK because LMQ(0101) = 0. ILMQ(1010) = 0, then 1000 will also be deleted. As a useul byproduct o subroutine Up, we et a set P, which consists o the minimal ancestors with LMQ 1 o the points in DK (i.e., the minimal points in the lattice orderin o the set o those ancestors o elements o DK that have LMQ 1). I there are no points in DK, P contains the root only.

14 224 D. ANGLUIN, ET AL. We summarize our discussion in the ollowin lemma. Recall that since Delimit is deterministic, the adversary may be thouht o as choosin in advance the answers to all possible limited membership queries. We denote by LMQ(v) the value o this unction on the point v. Lemma 1 Let T and P be the outputs o runnin Alorithm Delimit on a positive point p when the taret concept is a monotone DNF. Then 1. T contains a term o the taret concept coverin p. 2. For every t T, every point covered by t has LMQ 1or. 3. For any v such that LMQ(v) =1, and or any t T that covers v, t covers some point in P that is a descendant o v. Proo: Part 1 ollows rom the discussion above. Part 2 holds since every ancestor o the root is positive (since root itsel is), and since the code in Up ensures this condition or every other point in T. To see Part 3, suppose v has LMQ(v) =1and t is some point o T that covers v. I t is root, then root itsel is a point in P that is a descendant o v. I t is some other element o T, then Up has made an upward search rom t to ind all the minimal ancestors o t with LMQ 1 and placed them in P. We are now ready to analyze the runnin time o Delimit. Lemma 2 For any monotone DNF taret concept, Alorithm Delimit makes at most nl + n limited membership queries, where l is the number o responses received. In eneral, any sequence o s calls to Alorithm Delimit or the same taret concept makes at most n(l + s) limited membership queries. Proo: Every point queried by Delimit(p) is either a child o p, or a child o a previously queried point with LMQ 1, or else is within Hammin distance 1 o a point. For the last case, each time we receive a response on a point v, we could in principle need to query all the children o v in Down and then all the parents o v in Up, except that there is a parent or a child o v that we must have queried beore queryin v. This leads to at most n 1 queries or each received, plus 1 or the query itsel. As lon as the alorithm remembers the answers to all queries, this holds or any number o calls to Delimit on the same concept. For the irst two cases, let us irst note that they only happen in subroutine Down. Let us call the point that Down was called on the current point, which can be either p or some previously queried point a with LMQ(a) =1. When Down makes a query on a child o the current point and receives a response o, the query is already accounted or. I it receives a response o 0, then a bit is turned on in obits, decreasin the Hammin distance between the current point and obits. I it receives a response o 1, then a recursive call to Down is made, and the Hammin distance between the (new) current point and obits is also one less. Since the maximum Hammin distance between two points is n, there can be at most n queries o this kind in every call to Delimit. The runnin time o Delimit is polynomial in the number o queries it makes.

15 MALICIOUS OMISSIONS AND ERRORS Learnin Monotone Monomials rom Limited Membership Queries Alone We bein with a very simple application o Alorithm Delimit to learn monotone monomials rom limited membership queries in the nonstrict model. This should elucidate the basic ideas at the heart o the more complicated alorithm or learnin arbitrary monotone k-term DNF rom limited membership queries that ollows. Theorem 1 Monotone monomials can be learned rom no more than nl + n +1limited membership queries in the nonstrict model, where l is the number o responses received. Proo: The method is to run Alorithm Delimit startin with the all 1 s vector, and then output any term t T that covers every point in P. (I either LMQ(11 1) = 0, or LMQ(11 1) = and the down phase o Alorithm Delimit inds no positive points, then we will output the empty DNF ormula.) First, observe that such a term t must exist in T, since the true taret monomial covers every point in P and lies in T by point 1 o Lemma 1. The output term t covers every point with LMQ 1 by point 3 o Lemma 1, since it covers every point in P. And by point 2 o Lemma 1, it cannot cover any points with LMQ 0, since it is in T. The bound on the number o queries ollows rom Lemma 2, which counts all o them except or the irst query to The runnin time aain is polynomial in the number o queries made. Another computationally somewhat more eicient way o learnin monotone monomials would be to run Delimit on the all 1 s vector, and then output the monomial m that is the intersection o all the points in P. That is, the learner s output would have a variable i and only i that variable is in every term in P. In this version, monomial m covers every point in P, so by point 3 o Lemma 1, it must cover all points v such that LMQ(v) =1. On the other hand, since m is the intersection o positive points, it must either be the correct monotone monomial or an ancestor o the correct monotone monomial, so m cannot cover any neative points. Now we present a lower bound or this problem. Theorem 2 For any inteer c, i the limited membership oracle ives c ( n k=0 k) responses o, then any learner can be orced to use at least ( n c+1) 1 limited membership queries not countin those answered to learn monotone monomials. Proo: Let the taret concept be deined by a monomial o lenth n (c +1). It covers no point with more than (c +1)0-bits, and exactly one point with (c +1)0-bits. Now let us assume that all queries o a learnin alorithm on points containin c or ewer 0-bits are answered by, and its irst ( n c+1) 2 queries o points with at least (c+1) 0-bits are answered by 0. Ater that there are at least two unqueried points with (c+1)0-bits. The correspondin two monomials are both consistent with the answers iven so ar and they dier on points that did not receive a response. Hence the learnin alorithm needs at least one more query.

16 226 D. ANGLUIN, ET AL. Corollary 1 For any ixed constant c, i the limited membership oracle ives O(n c ) responses o, then the learner can be orced to use Ω(n c+1 ) limited membership queries to learn monotone monomials Learnin Monotone k-term DNF rom Limited Membership Queries Alone We are now ready to present the alorithm OnlyLMQ, that learns monotone k-term DNF in the nonstrict model rom limited membership queries alone. The detailed code is iven in Fiures 6 and 7. FindPos(h) S = maximal elements v 20;1 n s.t. h(v) ==0 For (each a 2 S) I (LMQ(a) ==1) Return a Else I (LMQ(a) ==?) S= S[children o a Return \no" Fiure 6. Subroutine FindPos. S is the set o maximal points in the sample space that are not covered by the current hypothesis h. Alorithm OnlyLMQ initializes its hypothesis h to be the empty ormula, and repeats the ollowin. By searchin down rom each maximal point not covered by hypothesis h, a point q with LMQ(q) =1that is not covered by h is ound. This is done by the subroutine FindPos o Alorithm OnlyLMQ. I no such point is ound, then h is correct and the alorithm outputs it and halts. Now the point q is iven to Alorithm Delimit, which returns the sets T and P. I there is a sinle term in T that covers all the positive points in P not already covered by h, then this term is added to h and we repeat the main For loop by lookin or another point with LMQ 1 that is not covered by the current hypothesis. Otherwise, we have to add more than one term to h to cover all the points in P, and we bein a search or a set o terms to add. This search may disclose more positive points, so this process itsel may have to be iterated. We bein by initializin Terms to be T and Pos to be those points in P that are not covered by h. We record the act that point q has already been delimited by placin it in the set Delimited. We then call Alorithm Delimit on every point in Pos and or each call ather toether the points rom P in NewPos as well as add the terms rom T to Terms. When all points in Pos have been delimited, we add the points athered in NewPos to Pos and try to cover the newly enlared set Pos with any j =2 terms rom Terms. I we succeed, we put these terms in h; i we ail, we keep enlarin Terms, atherin points in NewPos, then enlarin Pos, incrementin j, and tryin aain in the same manner to cover all points in Pos with j terms rom Terms.

17 MALICIOUS OMISSIONS AND ERRORS 227 OnlyLMQ() h = \the empty DNF ormula" For (ever) q = FindPos(h) I (q == \no") Output h Return ht;pi = Delimit(q) Delimited = q Terms = T Pos = r 2 P : h(r) ==0 For (j =1; ;j++) I (9t 1 ;t 2 ;:::;t j 2 Terms s.t. 8p 2 Pos ((t 1 _ :::_t j )(p) ==1)) Else Add the terms t 1 ;:::;t j to h Break /* Out o the For (j =1; ;j++) loop */ NewPos = For (each p 2 Pos s.t. p 62 Delimited) ht;pi = Delimit(p) Delimited = Delimited [p NewPos = NewPos [r 2P :h(r)==0 Terms = Terms [ T Pos = Pos [ NewPos Fiure 7. Alorithm OnlyLMQ, which learns monotone k-term DNF rom LMQ s. Delimited is the set o points that have been delimited in the current iteration o the main loop; Terms is the set o candidate terms; Pos is a set o known positive instances that need to be covered by a certain number o points in Terms.

18 228 D. ANGLUIN, ET AL. Theorem 3 Alorithm OnlyLMQ learns monotone k-term DNF rom O(kn k + n 2 l) limited membership queries in the nonstrict model, where l is the number o responses it receives. Proo: We prove the theorem in three staes. First we arue that the alorithm eventually terminates with a DNF hypothesis h that correctly classiies all non- points. Next, in Lemmas 3 throuh 6, we arue that h includes at most k terms. In particular, Lemma 6 says that at all times throuh the run o the alorithm, h is at least as eicient in terms o positive instances covered per DNF term as the taret concept. Finally, we arue that the query bound is correct. * Alorithm Delimit terminates with a correct hypothesis. All the terms added to the hypothesis are at some point in a set T output by Alorithm Delimit. Thereore, by point 2 o Lemma 1, no term that we ever add to our hypothesis can err by classiyin a point with LMQ 0 as positive. Furthermore, since point 1 o Lemma 1 uarantees that each time we call the subroutine Delimit rom a point we et at least one term in T coverin that point, the alorithm must eventually succeed in coverin the set Pos with some number o terms and break out o the For (j =1; ; j++) loop. Since there are only a inite number o positive points, this means that the alorithm eventually terminates with a hypothesis that correctly classiies all non- points. * Final hypothesis contains at most k terms. The ollowin lemmas provide the arument that each hypothesis o Alorithm OnlyLMQ contains at most k terms. Lemma 3 No term in the set Terms is ever implied by the current hypothesis h inside o the For (j =1; ;j++) loop. Proo: The hypothesis h is constant durin the execution o the loop. Every element o Terms entered Terms rom the set T enerated by callin Alorithm Delimit with this hypothesis h. The irst call to Delimit (the one made immediately beore enterin the For (j =1; ; j++) loop) is made on some point q such that h(q) =0, and thereore no element o T can be implied by h. All subsequent calls to Delimit are made on some point r Pos. The set Pos is constructed so that h(r) =0, so aain no points in the set T returned by Delimit can be implied by h, because T contains only descendants o r. Lemma 4 Whenever Alorithm OnlyLMQ attempts to cover the set o positive points Pos with a disjunction o j terms rom Terms (line I ( t 1,t 2,...,t j Terms s.t. p Pos ((t 1... t j )(p) ==1)) in the code), the set Terms actually contains at least j distinct terms o the taret concept. Proo: The proo is by induction on j. For the base case, j =1, the set Terms = T. Thus the base case is provided by point 1 o Lemma 1, which says that there must must be a term o the taret concept in T at the end o the irst call to Alorithm Delimit. For the inductive step, suppose that we are tryin to cover Pos with j +1terms. Then we know that we tried and ailed to cover the previous version o Pos with j terms. By the inductive hypothesis, this implies that the previous version o Terms contained at least j

19 MALICIOUS OMISSIONS AND ERRORS 229 distinct terms o the taret concept. I the previous version o Terms in act contained more than j distinct terms o the taret concept, then we are done. Otherwise, the previous version o Terms contained exactly j distinct terms o the taret concept, but nevertheless ailed to cover all points in Pos. That is, there was some point p in the previous version o Pos not covered by any o those particular j terms. This point p cannot have been delimited beore the attempt to cover Pos with j terms, because i it had, the previous version o Terms would have contained an element coverin p. Thereore, ater attemptin and ailin to cover Pos with j terms p is delimited. Now by point 1 o Lemma 1, a term o the taret concept coverin p is in the set T output by this call to Delimit, and this point is added to Terms. Thus Terms now contains at least j +1distinct terms o the taret concept. Lemma 5 Let h be the hypothesis that Alorithm OnlyLMQ obtains when it updates its hypothesis h. Then h covers all points with LMQ 1 that are covered by any term in Terms. Proo: Assume the contrary holds or h, Terms, and h. Let t Termsbe a term and x be a point with LMQ 1 such that h (x) =0and t(x) =1. Term t was added to Terms by some call to subroutine Delimit. At that call, accordin to point 3 o Lemma 1, the set P must have contained a point p such that t covered p and p is a descendant o x. Since h (x) =0, we know h(x) =0,soh(p)=0. Thereore, ater the call to Alorithm Delimit when the outputs had t T and p P, point p was added to Pos. Thus since h is satisied by all points in Pos, it must be that h (p) =1and thus h (x) =1, a contradiction. Lemma 6 Consider the hypothesis h o Alorithm OnlyLMQ at any point in the run o it or a monotone DNF taret concept c. There is a monotone DNF d consistin o at least #(h) distinct terms rom c such that {v : d(v) =LMQ(v) =1} {v:h(v)=1}. Proo: The proo is by induction on the number o times that h is enlared by addin new terms. The base case with an empty h holds trivially. Let us now assume that the lemma holds up to the k-th time we add new terms to h, and that it does not hold the k +1-st time. Let us denote h ater these additions o terms by h k and h k+1, respectively. That is: (1) there are #(h k ) terms t 1,t 2,...,t #(hk ) in c such that they cover a subset o those points with LMQ 1 that are covered by h k, (2) or any way we take #(h k+1 ) terms rom c, there always is some point w with LMQ 1 that is covered by these terms but not by h k+1. Furthermore, let us assume that exactly j terms were added to h k when makin h k+1, that is, #(h k+1 ) #(h k )=j. At the moment these j terms were added, Lemma 4 uaranteed the existence o j terms t 1,t 2,...,t j rom c in Terms. By Lemma 3 none o t 1,t 2,...,t j was implied by h k. Now suppose we take terms t 1,t 2,...,t #(hk ),t 1,t 2,...,t j, which are all in c, or a total o #(h k+1 ) terms. By our assumption there is a point w, such that LMQ(w) =1, that is covered by these terms but not by h k+1. By the assumption, w cannot be covered by any o the t 1,t 2,...,t #(hk ), or it would be covered by h k and hence

20 230 D. ANGLUIN, ET AL. h k+1. Thereore, it must be the case that w is covered by some o t 1,t 2,...,t j. But then, by Lemma 5, h k+1 must cover w, a contradiction. Lemma 6 implies that the hypothesis h o Alorithm OnlyLMQ never contains more than k terms, which is what we needed to show. What remains or the proo o Theorem 3 is to show that the number o limited membership queries is O(kn k + n 2 l). * Number o queries made. A monotone k-term DNF ormula can have at most n k maximal neative points. This ollows rom the ollowin observations. I v is a neative point and t is a term, then there must be a variable x i in t such that v[i] =0, or else v would satisy t. Thus, by settin to 0 at least one variable rom each term, we can obtain exactly the neative points. Maximal neative points are a subset o the points that can be obtained by settin exactly one variable rom each term to 0, which can be done in at most n k ways. Thus, each time S is initialized in subroutine FindPos, it can have at most n k elements. FindPos is called at most k +1times, since each time it is called (except the last), it returns a new positive point q which causes at least one term to be added to h. Each call to FindPos perorms at most one LMQ or each o at most n k elements initially in S. Ater that, each element queried is a child o a point with LMQ, so the number o such queries over all calls to FindPos is at most nl. (We assume FindPos caches answers.) Thus, the number o LMQ s made by FindPos is at most (k+1)n k +nl. The only other LMQ s are made in calls to Delimit, and by Lemma 2 the total number o queries is bounded by n(l + s), where s is the number o calls to Delimit. Delimit is called at most k times on points q returned by FindPos, and is called at most once or each element that is ever added to Pos. The elements added to Pos are all returned in P by Up, which means that they are all parents o points with LMQ. Hence, at most nl elements are ever added to Pos, and the total number s o calls to Delimit is bounded by k + nl. This ives a bound o n(l + k + nl) or the number o LMQ s made in the calls to Delimit. By addin the bound or the number on LMQ s made in calls to FindPos, we can easily et the desired bound o O(kn k + n 2 l) on the total number o LMQ s. The runnin time o the alorithm OnlyLMQ is clearly polynomial in the number o queries it makes Learnin Monotone DNF rom Equivalence and Limited Membership Queries In this subsection, we ive a very simple alorithm that learns monotone DNF with an unbounded number o terms rom equivalence and limited membership queries. Theorem 4 Monotone DNF ormulas can be learned rom n(m + l) limited membership queries toether with m +1equivalence queries in the nonstrict model, or toether with m + l + 1 equivalence queries in the strict model, where l is the number o responses received, and m is the number o terms in the taret monotone DNF. Proo: We modiy Anluin s alorithm or learnin monotone DNF rom ordinary membership and equivalence queries (Anluin, 1988), by usin Alorithm Delimit. The com-