Learning Twig and Path Queries

Similar documents
XML Data Integration using Fragment Join

Combinatorial Testing for Tree-Structured Test Models with Constraints

N V V L. R a L I. Transformer Equation Notes

1 Fractions from an advanced point of view

32. The Tangency Problem of Apollonius.

ON THE CHINESE CHECKER SPHERE. Mine TURAN, Nihal DONDURMACI ÇİN DAMA KÜRESİ ÜZERİNE

Summary: Vectors. This theorem is used to find any points (or position vectors) on a given line (direction vector). Two ways RT can be applied:

Intro to Circle Geometry By Raymond Cheong

Maximum area of polygon

Words Symbols Diagram. abcde. a + b + c + d + e

r (1+cos(θ)) sin(θ) C θ 2 r cos θ 2

LAPLACE S EQUATION IN SPHERICAL COORDINATES. With Applications to Electrodynamics

Curvature. (Com S 477/577 Notes) Yan-Bin Jia. Oct 8, 2015

The art of Paperarchitecture (PA). MANUAL

Homework 3 Solutions

tools for Web data extraction

Clause Trees: a Tool for Understanding and Implementing Resolution in Automated Reasoning

c b N/m 2 (0.120 m m 3 ), = J. W total = W a b + W b c 2.00

(Ch. 22.5) 2. What is the magnitude (in pc) of a point charge whose electric field 50 cm away has a magnitude of 2V/m?

1.2 The Integers and Rational Numbers

payments Excess demand (Expenditure>output) r > r Excess demand (Expenditure>output) r > r Excess supply (Expenditure<output) Excess supply

Angles 2.1. Exercise Find the size of the lettered angles. Give reasons for your answers. a) b) c) Example

1. Definition, Basic concepts, Types 2. Addition and Subtraction of Matrices 3. Scalar Multiplication 4. Assignment and answer key 5.

Chapter. Contents: A Constructing decimal numbers

DiaGen: A Generator for Diagram Editors Based on a Hypergraph Model

MATH PLACEMENT REVIEW GUIDE

Quick Guide to Lisp Implementation

Random Variables and Distribution Functions

Orbits and Kepler s Laws

GENERAL OPERATING PRINCIPLES

SECTION 7-2 Law of Cosines

Financing Terms in the EOQ Model

THE PRICING OF IMPLICIT OPTIONS IN LIFE INSURANCE CONTRACTS A GENERAL APPROACH USING MULTIVARIATE TREE STRUCTURES

PROF. BOYAN KOSTADINOV NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY

Lesson 2.1 Inductive Reasoning

Fluent Merging: A General Technique to Improve Reachability Heuristics and Factored Planning

Forensic Engineering Techniques for VLSI CAD Tools

2. Properties of Functions

Word Wisdom Correlations to the Common Core State Standards, Grade 6

Screentrade Car Insurance Policy Summary

(1) continuity equation: 0. momentum equation: u v g (2) u x. 1 a

National Firefighter Ability Tests And the National Firefighter Questionnaire

OxCORT v4 Quick Guide Revision Class Reports

Regular Sets and Expressions

Module 5. Three-phase AC Circuits. Version 2 EE IIT, Kharagpur

Volumes by Cylindrical Shells: the Shell Method


Orthopoles and the Pappus Theorem

1 GSW IPv4 Addressing

Basic Principles of Homing Guidance

If two triangles are perspective from a point, then they are also perspective from a line.

Appendix D: Completing the Square and the Quadratic Formula. In Appendix A, two special cases of expanding brackets were considered:

Equivalence Checking. Sean Weaver

Data Quality Certification Program Administrator In-Person Session Homework Workbook

CS99S Laboratory 2 Preparation Copyright W. J. Dally 2001 October 1, 2001

The Cat in the Hat. by Dr. Seuss. A a. B b. A a. Rich Vocabulary. Learning Ab Rhyming

EQUATIONS OF LINES AND PLANES

and thus, they are similar. If k = 3 then the Jordan form of both matrices is

Ratio and Proportion

50 MATHCOUNTS LECTURES (10) RATIOS, RATES, AND PROPORTIONS

On Equivalence Between Network Topologies

CS 316: Gates and Logic


The remaining two sides of the right triangle are called the legs of the right triangle.

Learning Schemas for Unordered XML

You should have the following for this examination a multiple-choice answer sheet a pen with black or blue ink

Reasoning to Solve Equations and Inequalities

Figure 2. So it is very likely that the Babylonians attributed 60 units to each side of the hexagon. Its resulting perimeter would then be 360!

Theoretical and Computational Properties of Preference-based Argumentation

Arc-Consistency for Non-Binary Dynamic CSPs

JCM TRAINING OVERVIEW Multi-Download Module 2

Chapter 19: Electric Charges, Forces, and Fields ( ) ( 6 )( 6

G.GMD.1 STUDENT NOTES WS #5 1 REGULAR POLYGONS

2. Use of Internet attacks in terrorist activities is termed as a. Internet-attack b. National attack c. Cyberterrorism d.


FUNCTIONS DEFINED BY IMPROPER INTEGRALS

DATABASDESIGN FÖR INGENJÖRER F

Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( )

INSTALLATION, OPERATION & MAINTENANCE

9 CONTINUOUS DISTRIBUTIONS

SOLVING EQUATIONS BY FACTORING

Unit 5 Section 1. Mortgage Payment Methods & Products (20%)

CHAPTER 31 CAPACITOR

OUTLINE SYSTEM-ON-CHIP DESIGN. GETTING STARTED WITH VHDL August 31, 2015 GAJSKI S Y-CHART (1983) TOP-DOWN DESIGN (1)

- DAY 1 - Website Design and Project Planning


Bayesian Updating with Continuous Priors Class 13, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Uncertain Version Control in Open Collaborative Editing of Tree-Structured Documents

WHAT HAPPENS WHEN YOU MIX COMPLEX NUMBERS WITH PRIME NUMBERS?

The LCOE is defined as the energy price ($ per unit of energy output) for which the Net Present Value of the investment is zero.


Continuous Compounding and Annualization

SOLVING QUADRATIC EQUATIONS BY FACTORING

Solution to Problem Set 1

16. Mean Square Estimation

, and the number of electrons is -19. e e C. The negatively charged electrons move in the direction opposite to the conventional current flow.

Example A rectangular box without lid is to be made from a square cardboard of sides 18 cm by cutting equal squares from each corner and then folding

A Note on Risky Bond Valuation

Inter-domain Routing


Transcription:

Lening Twig n Pth Queies Słwek Stwoko Moste, INRIA & LIFL (CNRS UMR8022) Univesity of Lille, Fne slwomi.stwoko@ini.f Piot Wiezoek Institute of Compute Siene Univesity of Wołw piot.wiezoek@s.uni.wo.pl ABSTRACT We investigte the polem of lening XML queies, pth queies n twig queies, fom exmples given y the use. A lening lgoithm tkes on the input set of XML ouments with noes nnotte y the use n etuns quey tht selets the noes in mnne onsistent with the nnottion. We stuy two lening settings tht iffe with the types of nnottions. In the fist setting the use my only inite equie noes tht the quey must selet (i.e., positive exmples). In the seon, moe genel, setting, the use my lso inite foien noes tht the quey must not selet (i.e., negtive exmples). The quey my o my not selet ny noe with no nnottion. We fomlize wht it mens fo lss of queies to e lenle. One equiement is the existene of lening lgoithm tht is soun i.e., lwys etuning quey onsistent with the exmples given y the use. Futhemoe, the lening lgoithm shoul e omplete i.e., le to poue evey quey with suffiiently ih exmples. Othe equiements involve ttility of the lening lgoithm n its oustness to nonessentil exmples. We ientify ptil lsses of Boolen n uny, pth n twig queies tht e lenle fom positive exmples. We lso show tht ing negtive exmples to the pitue enes lening unfesile. Ctegoies n Sujet Desiptos H.3.3 [Infomtion Systems]: Infomtion Seh n Retievl Quey fomultion; I.2.6 [Atifiil Intelligene]: Lening Genel Tems Algoithms, Lnguges, Theoy. Keywos XML, XPth, twigs, lening, quey ontinment, quey infeene, minimlity, onsisteny. Pemission to mke igitl o h opies of ll o pt of this wok fo pesonl o lssoom use is gnte without fee povie tht opies e not me o istiute fo pofit o ommeil vntge n tht opies e this notie n the full ittion on the fist pge. To opy othewise, to epulish, to post on seves o to eistiute to lists, equies pio speifi pemission n/o fee. ICDT 2012, Mh 26 30, 2012, Belin, Gemny. Copyight 2012 ACM 978-1-4503-0791-8/12/03...$10.00 1. INTRODUCTION XML hs eome e fto stn fo epesenttion n exhnge of t in we pplitions. An XML oument is silly lele tee whose leves stoe textul t n the stn XML fomt is text se to llow uses n esy n iet ess to the ontents of the oument [42]. Howeve, to stisfy even moest infomtion nees, the use is often equie to fomulte he queies using one of existing quey lnguges whose ommon oe is XPth [43, 44]. XPth queies llow to ess the ontents of the esie noes with syntx simil to ietoy pths use to nvigte in the UNIX file system. Unfotuntely, even the XPth quey lnguge, n ny lnguge with foml syntx, might e too iffiult to e essile to evey use, n in genel, thee is lk of fmewoks llowing the use to fomulte the quey without the knowlege of speilize quey lnguge. In this ppe, we popose to ess this gp with the help of lgoithms tht infe the quey fom exmples given y the use. We emk, howeve, tht the nee fo genel infeene of XML queies is justifie y othe novel tse pplitions. Fo instne, in the setting of XML t exhnge [6] the ptten queies use to efine t mppings nee to e speifie y the use. A lening lgoithm oul e se fo el -ho t exhnge solutions, whee the ptten queies efining mppings e infee s new soues e isovee. Anothe exmple of potentil pplition is wppe inution [20, 39]. The polem of XML quey lening is efine s follows: given n XML oument with noes nnotte y the use onstut quey tht selets the noes oingly to the nnottions. Clely, this polem hs two pmetes: the lss of queies within whih the lgoithm shoul poue its esult n the type of nnottions the use my use. In the uent wok we fous on two well-known sulsses of XPth: twig n pth queies [1]. We ientify two types of nnottions: equie noes i.e., noes tht nee to e selete y the quey, n foien noes i.e., those tht the quey must not selet. Beuse we o not equie ll noes to e nnotte, evey unnnotte noe is impliitly nnotte s neutl, whih mens tht the quey my o my not selet it. In tems of omputtionl lening theoy [22], equie noe is lle positive exmple n foien noe is negtive exmple. In this ppe, we onsie two settings: one, whee the use povies only positive exmples, n moe genel one, whee oth positive n negtive exmples e pesent.

Exmple 1 Tke fo instne the XML oument in Figue 1 with liy listing. Some of its elements e nnotte s equie (+) n some s foien ( ). + title Cpitl liy olletion ook ook + utho title K. Mx Mnifesto utho utho title K. Mx F. Engels utho F. Engels The onitions of... Figue 1: Annottion of liy tse The quey tht the use might wnt to eeive is one tht selets the titles of woks y K. Mx: q 0 = /liy/[utho= K. Mx ]/title. The quey /liy/[utho= K. Mx ]/ is lso onsistent with the nnottion ut it popely ontins q 0. This mkes q 0 moe speifi w..t. the use nnottions, n theefoe, my e ette fitte fo the esults of lening. The quey seleting titles of ll woks, /liy//title is not onsistent euse it selets the foien title noe. The quey /liy/ook[utho= K. Mx ]/title is lso not onsistent with the nnottion euse it oes not selet the equie title noe of Cpitl. Ou stuy equies us to efine peisely wht it mens fo lss of queies Q to e lenle. We popose efinition influene y omputtionl lening theoy [22], n infeene of lnguges in ptiul [21, 32, 13]. Fist of ll, fo Q to e lenle thee must exist lening lgoithm lene whih on the input tkes smple S i.e., set of exmples, n etuns quey q Q. Ntully, lene shoul e soun, tht is the quey q must e onsistent with the smple S. Beuse the sounness onition is not enough to filte out tivil lening lgoithms (f. isussion following Definition 2), we futhemoe equie lene to e omplete, tht is le to len evey quey with suffiiently infomtive exmples. Moe peisely, lene is omplete if fo evey q Q thee exists so lle hteisti smple CS q of q (w..t. lene) suh tht lene(cs q) etuns q. Note tht n unsvy use in the ole of tehe my not know extly wht is the hteisti smple, ut the ttempt to ppoh it y ing moe n moe exmples until the lgoithm etuns stisftoy quey. Consequently, it is ommonly equie fo the hteisti smple to e oust une inlusion i.e., lene(s) shoul etun q fo ny smple S tht extens CS q while eing onsistent with q. Finlly, polynomil estitions e impose on lene n the size of the hteisti smple to ensue ttility of the fmewok. The pimy gol of this ppe is lening uny queies, ut on the wy thee we lso investigte the lenility of Boolen queies. Uny queies selet set of noes in oument n e typilly use fo infomtion exttion tsks. On the othe hn, Boolen queies test whethe o not given oument stisfies etin popety, n thei typil use se is the lssifition of ouments e.g., fo filteing puposes. When lening Boolen quey, n exmple is tee with mke initing whethe it is positive o negtive exmple. Exmple 2 Consie simple XML fee with offes fom onsume-to-onsume we site (Figue 2) nnotte y the use s eithe equie (+) o foien ( ). + offe item type es offe item type es item type es Fo sle Wnte Fo sle Wnte Aui A4 MBook 3D Puzzle Eee PC + offe list item type es Figue 2: An nnotte XML stem. A Boolen quey stisfying the use nnottions selets ll sle offes i.e., q 1 =.[offe//item/type= Fo sle ]. We investigte the lenility fo Boolen n uny, pth n twig queies in the pesene of positive exmples only n in the pesene of oth positive n negtive exmples. Fo lening in the pesene of positive exmples only, we ientify ptil sulsses of nhoe pth queies n pth-susumption-fee twig queies tht e lenle. The min ie ehin ou lening lgoithms is to ttempt to onstut n (inlusion-)miniml quey onsistent with the exmples. Intuitively this mens tht ou lgoithms ty to onstut quey tht is s speifi s possile with espet to the use input (f. q 0 in Exmple 1). This ppoh is ommon to host of lgoithms lening onepts fom positive exmples [3] inluing evesile egul lnguges [4], k-testle egul lnguges [18], n single ouene egul expessions [8]. While ou lening lgoithms fo pth queies etun miniml queies onsistent with the input smple, we show tht this ppoh nnot e fully opte fo twig queies euse thee e input smples fo whih the onsistent miniml twig quey is of exponentil size. Hee, ou lening lgoithms etun queies tht n e seen s polynomilly-size ppoximtions. The lenility of the full lsses of pth n twig queies emins n open question. Howeve, we ientify the essentil popeties of the quey lsses tht enle ou lening tehniques, n oseve tht these popeties o not hol fo the full lsses of pth n twig queies. This inites tht new ppohes my nee to e exploe if lening of the full lsses is fesile t ll. In the setting whee oth positive n negtive exmples e llowe, we stuy the onsisteny polem: given oument with set of positive n negtive nnottions is thee quey tht stisfies the nnottions? This polem is tivil if only positive exmples e given euse the

univesl quey, tht selets ll noes in tee, is onsistent with ny set of positive exmples. Howeve, s we show, ing even one negtive exmple enes the onsisteny polem inttle. This esult hols fo ll onsiee lsses of queies, inluing nhoe pth queies n pthsusumption-fee twig queies, n in ft, it hols fo so simple lsses of queies tht it is h to envision some esonle estitions tht woul mit lenility in the pesene of positive n negtive exmples. The min ontiution of this ppe is efining n estlishing theoetil ounies fo lening pth n twig queies fom exmples. To the est of ou knowlege this is the fist wok essing this ptiul polem. Aitionlly, we investigte two polems tht might e of inepenent inteest: onstuting miniml quey onsistent with set of positive exmples n heking the onsisteny of set of positive n negtive exmples. The hteiztion of the popeties of the lenle lsses of queies n the lgoithm fo lening uny pth queies e se on existing tehniques, tee ptten homomophisms [27, 26] n ptten lening [2, 37], ut we employ them in new, nontivil wys. The emining esults, inluing the emining lening lgoithms n inttility of the onsisteny polem, e new n nontivil. The ppe is ognize s follows. In Setion 2 we intoue si notions n efine fomlly the lening fmewok. In Setion 3 we efine the lenle sulsses of queies n ientify thei essentil popeties tht enle ou lening lgoithms. In Setions 4 though 7 we pesent the oesponing lening lgoithms. In Setion 8 we isuss the impt of negtive exmples on lening. We isuss the elte wok in Setion 9. Finlly, we summize ou esults n outline futhe ietions in Setion 10. Beuse of spe estition we pesent only skethes of the most impotnt poofs; omplete poofs will e given in the full vesion of the ppe (uently in peption fo jounl sumission). Aknowlegments. We woul like to thnk ou fellow ollegues n nonymous eviewes fo thei helpful omments. We lso woul like to thnk Ru Ciunu n Ion Am who implemente the lgoithms n she thei insights llowing to impove theoetil popeties of the lgoithms. This eseh hs een ptilly suppote y Ministy of Highe Eution n Reseh, No-Ps e Clis Regionl Counil n FEDER though the Contt e Pojets Ett Region (CPER) 2007-2013, Coex pojet ANR-08-DEFIS- 004, n Polish Ministy of Siene n Highe Eution eseh pojet N N206 371339. 2. BASIC NOTIONS Thoughout this ppe we ssume n infinite set of noe lels Σ whih llows us to moel ouments with textul vlues. We lso ssume tht Σ hs totl oe, tht n e teste in onstnt time, n hs miniml element tht n e otine in onstnt time s well. We exten the oe on Σ to the stn lexiogphil oe lex on wos ove Σ n efine well-foune nonil oe on wos: w n u iff w < u o w = u n w lex u. Tees. We moel XML ouments with unnke lele tees. Fomlly, tee t is tuple (N t, oot t, l t, hil t), whee N t is finite set of noes, oot t N t is istinguishe oot noe, l t : N t Σ is leling funtion, n hil t N t N t is the pent-hil eltion. We ssume tht the eltion hil t is yli n equie evey non-oot noe to hve extly one peeesso in this eltion. By Tee 0 we enote the set of ll tees. The size of tee is the inlity of its noe set. The epth of noe is the length of the pth fom the oot to the noe n the height of the tee is the epth of its eepest lef. Fo tee t y Pths(t) we enote the set of pths fom the oot noe to the lef noes of t. We view pth oth s tee, in ptiul it hs noes, n s wo. Often, we use unnke tems ove Σ to epesent tees. Fo instne, the tem ((), (()), (())) oespons to the tee t 0 in Figue 3(). () Tee t 0. () Deote tees t 1 n t 2. Figue 3: Tees. To epesent exmples n nswes to queies, we use tees with one istinguishe selete noe. Fomlly, eote tee is pi (t, sel t), whee t is tee n sel t N t is istinguishe selete noe. We enote the set of ll eote tees y Tee 1. Figue 3() ontins two eote vesions of t 0: the selete noe is inite with sque ox. In the sequel, we ely mke the istintion etween stn tees n eote ones, n when it oes not le to miguity, we efe to oth stutues s simply tees. Queies. We wok with the lss of twig queies, lso know s tee ptten queies [1]. Twig queies e essentilly unoee tees whose noes my e itionlly lele with istinguishe wil symol n tht use two types of eges, hil n esennt, oesponing to the stn XPth xes. To moel uny queies we lso istinguishe seleting noe. () Boolen twig quey q 0. () Uny pth quey p 0. Figue 4: Twig queies. A Boolen twig quey q is tuple (N q,oot q,l q,hil q,es q), whee N q is finite set of noes, oot q N q is the oot noe, l q : N q Σ {} is leling funtion, hil q N q N q is set of hil eges, n es q N q N q is set of esennt eges. We ssume tht hil q es q = n

tht the eltion hil q es q is yli n equie evey non-oot noe to hve extly one peeesso in this eltion. By Twig 0 we enote the set of ll Boolen twig queies. A uny twig quey is pi (q, sel q), whee q is Boolen twig quey n sel q N q is istinguishe seleting noe. We enote the set of ll twig queies y Twig 1. Figue 4 ontins exmples of twig queies: hil eges e wn with single line, esennt eges with oule line, n the seleting noe is inite with sque ox. Aitionlly, we use estite lsses of Boolen n uny pth queies, Pth 0 n Pth 1 espetively. Fomlly, Pth i ontins those elements of Twig i whose noes hve t most one hil. Futhemoe, the seleting noe of uny pth quey is lwys its only lef (f. Figue 4()). We note tht Twig 1 ptues extly the lss of esening positive isjuntion-fee XPth queies, n in the sequel, we use elements of the evite XPth syntx [43, 44] to pesent oth elements of Twig 1 n Twig 0. Fo instne, the quey in Figue 4() n e witten s /[]//, n the quey in Figue 4() s ///. Beuse no uny twig quey n selet t the sme time the oot noe n nothe noe of tee, we isllow the oot to e n nswe, n fom now on, we onsie only uny queies n eote tees whose selete noe is othe thn oot. Note tht this estition n e esily ypsse y ing vitul oot noe to evey tee in the input smple. Also, this wy the univesl quey is //. Emeings. We efine the semntis of twig queies using the notion of emeing whih is essentilly mpping of noes of quey to the noes of tee (o nothe quey) tht espets the semntis of the eges of the quey. In the sequel, fo two x, y Σ {} we sy tht x mthes y if y implies x = y. Note tht this eltion is not symmeti: mthes ut oes not mth. Fomlly, fo i {0, 1}, quey q Twig i n tee t Tee i, n emeing of q in t is funtion λ : N q N t suh tht: 1. λ(oot q) = oot t, 2. fo evey (n, n ) hil q, (λ(n), λ(n )) hil t, 3. fo evey (n, n ) es q, (λ(n), λ(n )) (hil t) +, 4. fo evey n N q, l t(λ(n)) mthes l q(n), 5. if i = 1, then λ(sel q) = sel t. Then, we wite λ : q t o simply t q. Figue 5 pesents ll emeings of the quey q 0 in the tee t 0 (Figue 3()). Note tht we o not equie the emeing to e injetive i.e., two noes of the quey my e mppe to the sme noe of the tee. Emeings of pth queies e, howeve, lwys injetive. Also, note tht the semntis of //-ege is tht of pope esennt (n not tht of esennt-o-self). Typilly, the semntis of uny quey is efine in tems of the set of noes it selets in tee [25, 26]: noe n of Figue 5: Emeings of q 0 in t 0. tee t is n nswe to uny twig quey q in t if thee is n emeing λ : q t suh tht λ(sel q) = n (then n is lso si to e ehle y q in t). Howeve, we use n ltentive wy of efining the semntis of quey. Fomlly, the lnguge of quey q Twig i fo i {0, 1} is the set L i(q) = {t Tee i t q}. Ntully, the two notions e vey losely elte e.g., the eote tees t 1 n t 2 (Figue 3) elong to L 1(p 0) (Figue 4) n the noes selete in t 1 n t 2 e extly the nswes to p 0 in tee t 0. The notion of n emeing extens in ntul fshion to pi of queies q, p Twig i fo some i {0, 1}: n emeing of q in p is funtion λ : N q N p tht stisfies the onitions 1, 2, 4, 5 ove (with t eing eple y p) n the following onition: 3. fo ll (n, n ) es q, (λ(n), λ(n )) (hil p es p) +. Then, we wite λ : p q o simply q p n sy tht p susumes q. The ontinment (o inlusion) q p of two queies q, p T wig i fo i {0, 1} is simply L i(q) L i(p), n we sy tht q n p e equivlent, enote q p, if q p n p q. Note tht fo twigs, susumption implies ontinment i.e., if q p, then q p. The onvese oes not hol in genel. Fo instne, we hve [.//] [] ut [.//] []. Thee e lso signifint omputtionl iffeenes: the ontinment of twigs is onp-omplete [36, 30] whees thei susumption is in PTIME. Quey minimlity. In this ppe we ientify queies tht e miniml fo given set of tees (s exmples). It is impotnt to emphsise tht we lwys men minimlity in tems of quey inlusion. Fomlly, fo i {0, 1}, lss of queies Q Twig i, quey q Q, n set of tees S Tee i, we sy tht q is miniml quey in Q onsistent with S if S L i(q) n thee is no q Q suh tht q q, q q, n S L i(q ). Lening fmewok. We use vint of the stn lnguge infeene fmewok [22, 21, 32, 13] pte to lening queies. A lening setting ompises of the set of onepts tht e to e lent, in ou se queies, n the set of instnes of the onepts tht e to seve s exmples in lening, in ou se tees (possily eote). These two sets e oun togethe y the semntis whih mps evey onept to its set of instnes.

Definition 1 A lening setting is tuple (D, Q, L), whee D is set of exmples, Q is lss of queies, n L is funtion tht mps evey quey in Q to the set of ll its exmples ( suset of D). As n exmple, setting fo lening uny Twig queies fom positive exmples is the tuple (Tee 1, Twig 1, L 1). This genel fomultion llows lso to esily efine settings fo lening fom oth positive n negtive exmples, whih we pesent in Setion 8. To efine fomlly wht lenility fo queies mens we fix lening setting K = (D, Q, L) n intoue some uxiliy notions. A smple is finite nonempty suset S of D i.e., set of exmples. The size of smple is the sum of the sizes of the exmples it ontins. A smple S is onsistent with quey q Q if S L(q). A lening lgoithm is n lgoithm tht tkes smple n etuns quey in Q o speil vlue Null. Definition 2 A quey lss Q is lenle in polynomil time n t in the setting K = (D, Q, L) iff thee exits polynomil lening lgoithm lene n polynomil poly suh tht the following two onitions e stisfie: 1. Sounness. Fo ny smple S the lgoithm lene(s) etuns quey onsistent with S o speil Null vlue if no suh quey exists. 2. Completeness. Fo ny quey q Q thee exists smple CS q suh tht fo evey smple S tht extens CS q onsistently with q i.e., CS q S L(q), the lgoithm lene(s) etuns quey equivlent to q. Futhemoe, the size of CS q is oune y poly( q ). The smple CS q is often lle the hteisti smple fo q w..t. lene n K ut we point out tht fo lening lgoithm thee my exist mny smples fitting the ole n the efinition of lenility equies meely tht one suh smple exists. The sounness onition is ntul equiement ut lone it is insuffiient to eliminte tivil lening lgoithms. Fo instne, fo the setting whee only positive exmples e use, n lgoithm etuning the univesl quey // is soun. Consequently, we equie the lgoithm to e omplete nlogously to how it is one fo gmmtil lnguge infeene [21, 32, 13]. An ltentive n ntul wy to n tivil lening lgoithms woul e to equie the lgoithm to etun some miniml quey onsistent with the input smple. Ou ppoh follows this ietion ut s we show lte on, it is not possile to fully hee to it euse thee exist smples fo whih the miniml onsistent twig quey is of exponentil size. 3. LEARNABLE QUERY CLASSES In this setion we efine the lsses of queies, tht in the following setions we pove lenle fom positive exmples, n ientify two essentil popeties of these lsses tht enle ou lening lgoithms. Both popeties follow fom the impotne of logil implition in lening: lening n often e seen s seh of the oet hypothesis otine y n itetive efinement of some initil hypothesis n t evey itetion the uent hypothesis is often logil onsequene of the pevious one. The fist popety equies the ontinment to e equivlent to susumption, whih llows to ptue ontinment with simple stutul hteiztion. The seon popety is the existene of polynomilly size mth sets [26], whih wee oiginlly intoue s n esy wy of testing quey inlusion. The mth sets tht we onstut will seve us s the hteisti smples. We emphsise tht the full lsses of twig n pth queies o not hve these popeties ut this oes not imply tht they e not lenle ut it meely pelues the iet pttion of ou lening tehniques. Whethe the full lsses of queies e lenle emins n open question. To fomlly efine the two popeties, we fix lss of queies Q with thei semntis efine y L. The popeties e: (P 1) fo evey two q 1, q 2 Q, q 1 q 2 if n only if q 1 q 2. (P 2) evey q Q hs polynomil mth set i.e., set CS q of (positive) exmples suh tht the size of CS q is polynomil in the size of q n fo evey q Q we hve q q if n only if CS q L(q ). We next pesent the onstution of mth sets in genei fom n then we intoue the lenle lsses of queies n stte the popeties P 1 n P 2 fo them. 3.1 Mth sets s hteisti smples We now pesent the onstution of mth sets tht will e lte on use s hteisti smples. Beuse the onstutions of the mth sets fo ll the sulsses of queies e vey simil, we pesent it in genei fom. Tke twig quey q, let N e the size of q, 0 e the miniml element of Σ, n 1 n 2 e two fesh symols not use in q n iffeent fom 0. The onstute mth set CS q ontins extly two tees: t 0 is otine fom q y epling evey with 0 n evey esennt ege y hil ege; t 1 is otine fom q y epling evey with 1 n evey esennt ege with pth of length N whose ll noes e lele with 2. Figue 6 ontins the hteisti smple fo the uny twig quey q 1 = /[//]//[]//. We point q 1 : t 0 : 0 t 1 : { 8 2 2 1 2 2 } 8 Figue 6: The hteisti smple fo q 1. out tht fo quey n lening lgoithm thee might e moe thn just one hteisti smple. This is lso the se with ou lening lgoithms. While the onstution we pesent ove might seem quite tifiil, we use it ue to its popeties tht might e of inepenent inteest (mth sets). Simple, n esie to ompose y unskille use, hteisti smples e often possile.

3.2 Anhoe pth queies We egin with se sulss of pth queies, lle nhoe pth queies. Essentilly, pth quey is nhoe when no inne noe is inient to //-ege. The min eson fo intouing this lss of queies is tht when woking with thei emeings the estition on the use of // llows us to limit the jumps tht the emeing my pefom in etween two noes onnete y esennt ege. An itionl estition on the lef noe of Boolen pth queies is impose fo tehnil esons (f. poof of Lemm 3.1 fo moe etils). Fomlly, the lss of uny nhoe pth queies imposes one estition: //-ege nnot e inient to -noe unless it is the oot noe o the lef noe (whih is lso seleting). Fo instne, the uny queies //////, /////, n // e nhoe ut the quey ///// is not. An itionl estition is impose on the Boolen nhoe pth queies: if the lef noe is, then the ege inient to it is //. Fo instne, the Boolen queies ////// n //////// e nhoe ut the Boolen quey ///// is not nhoe. We enote y AnhPth 1 n AnhPth 0 the sets of uny n Boolen nhoe pth queies espetively. Clely, the sulsses of nhoe pth queies e popely inlue in the full lsses of pth queies, howeve, we elieve tht the estitions e not vey limiting n the lsses of nhoe queies emin ptil. Bsilly, nhoe pth queies nnot isiminte the esennts of noe se on thei epth lone. We lso point out tht the itionl estition impose on Boolen queies is quite mino: the Boolen quey /// is not nhoe ut it is equivlent to //// whih is nhoe. Note, howeve, tht the Boolen quey //// oes not hve n equivlent Boolen nhoe quey. While P 1 fo nhoe pth queies follows fom the esults in [27, 26], elow we pesent poof using tehnique tht llows to show P 1 n P 2 fo ll the quey lsses we intoue lte on (n these esults e new n nnot e eive fom the esults in [27, 26]). Lemm 3.1 Uny n Boolen nhoe pth queies hve the popeties P 1 n P 2. To pove this lemm it suffies to show the following lim. Clim 1 Fo ny i {0, 1} n ny two q, q AnhPth i, if CS q L i(q ), then thee exists n emeing λ : q q. Poof We fist give n equivlent yet moe stutue efinition of nhoe pth queies. A lok is pth quey fgment B of the fom σ 0/... /σ n, whee n 0, σ 0, σ n Σ, n σ 1,..., σ n 1 Σ {}. An nhoe pth quey q is pth quey of the fom B 0//B 1//... //B k, whee k 0, B i is lok fo 1 i k 1, n B 0 is eithe lok tht n stt with o single ouene of. Also, in se of Boolen nhoe pth queies B k is eithe lok o single ouene of n in se of uny nhoe pth queies B k is eithe lok tht n en with o single ouene of. We fist pove the lim fo uny queies (i.e., i = 1). Let N = q n CS q = {t 0, t 1} e onstute s esie in Setion 3.1. Fo evey noe n of t 1 whose lel is not 2 y oigin(n) we enote the noe of q oesponing to n. Also, fix λ 1 : q t 1. We mke sevel osevtions. Fist, q N, o othewise thee woul e no emeing of q into t 0. Fo the sme eson, q oes not use the lels 1 n 2. Theefoe, if noe n of q is mppe y λ 1 to noe with lel 1 o 2, then l q (n) =. Next, we show tht λ 1 mps noes of q only to those noes of t 1 tht e not lele with 2. This is lely the se fo the oot noe n the seleting noe of q, tht e mppe to the oot noe n the seleting noe of t 1, n fom the onstution of t 1, they hve lels iffeent fom 2. In the following we show the this is the se with othe noes. Let q e of the fom B 0//B 1//... //B k. Note tht if noe n is on the oe of B j (fo 0 j k) then fom the efinition of lok n nnot e mppe to 2. This is euse n is eithe oot noe, o seleting noe o its lel is not. Suppose, tht some noe of q elonging to B i fo 0 i k, is mppe to noe with lel 2 n let n 1 n n 2 e the noes tht e on the oes of B. Beuse B q N n in t 1 noes lele with 2 ome in sequenes of length N, one of the noes n 1 n n 2 nees to e mppe to noe lele with 2. This implies tht one of n 1 n n 2 is lele with ; ontition. This shows tht λ = λ 1 oigin is popely efine funtion mpping N q to N q. We now show tht λ is n emeing of q into q. The onition 2 hols euse λ 1 peseves the hil eltion n if noes (n 1, n 2) e in hil t1 n oth e not lelle with 2 then (oigin(n 1), oigin(n 2)) hil q. The onitions 1, 3, n 5 follow fom the efinition of λ. Fo the onition 4, tke ny n N q suh tht σ = l q (n) n note tht then, λ 1(n) in t 1 hs the sme lel σ whih is iffeent fom 1 (euse q oes not use 1) n 2 (s shown ove). Theefoe the noe of q tht oespons to λ 1(n) is lele with σ s well. The poof fo Boolen nhoe pth queies is nlogous n it suffies to onsie the se when B k is single ouene of. Then inee n emeing λ 1 : q t 1 my mp the -lef to noe lele with 2. We note, howeve, tht λ 1 n e esily ltee to mp the -lef to non 2-noe euse the -lef is onnete to with esennt ege n evey 2 noe in t 1 hs esennt tht is not lele with 2. To show P 1 it is enough to show the implition fom left to ight. Assume q q n note tht CS q L(q). Theefoe, CS q L(q ), whih y Clim 1, gives us q q. P 2 follows ietly fom Clim 1. 3.3 Conjuntions of nhoe pth queies In ou ppoh to len twig queies we use pth lening lgoithms to infe set of pth queies stisfie in the input smple n then we omine these pth queies into twig quey. Theefoe, the mipoint etween lening pth queies

n twig queies is lening onjuntions of pth queies. We pply this tehnique only to len Boolen twig queies n so we fous only on lening Boolen onjuntions of pth queies. Fo onveniene we use sets of Boolen pth queies to epesent onjuntions ut onjuntion n lso e seen s Boolen twig quey onsisting of pth queies meeting t the oot noe. The seon epesenttion is use to efine the semntis of onjuntions n thei hteisti smples. Beuse ou pth lening lgoithms infe nhoe queies, we onsie only onjuntions of Boolen nhoe pth queies. Also, if we hve infee two Boolen pth queies p 1 n p 2, n p 1 susumes p 2, then fom the point of lening thee is no point in keeping p 2 euse p 1 ontins moe speifi infomtion n mkes p 2 eunnt. Consequently, we onsie only eue onjuntions i.e., hving no two iffeent p 1, p 2 suh tht p 1 p 2. Ntully the onjuntions must e lso he-onsistent i.e., ny two pths queies in onjuntion muh hve the sme oot lel o othewise we woul not e le to epesent it s twig quey. By ConjPth 0 we enote the lss of onjuntions of Boolen nhoe pth queies stisfying the estitions esie ove. The use of nhoe pth queies llows to pove the following lemm in mnne nlogous to the poof of Lemm 3.1. Lemm 3.2 Conjuntions of Boolen nhoe pth queies hve the popeties P 1 n P 2. 3.4 Pth-susumption-fee twig queies As mentione peviously, ou lening lgoithms fo twig queies ttempt to onstut the quey q y omining the pth queies fom onjuntion infee efoehn. Beuse we infe eue set of pth queies, the onstute Boolen twig quey q hs no two p 1, p 2 Pths(q) suh tht p 1 p 2, whee Pths(q) is the set of Boolen pth queies on pths fom the oot to ll leves of q. Ntully, ll pth queies in Pths(q) nee to e nhoe. Fomlly, Boolen twig quey q is pth-susumption-fee iff Pths(q) is eue set of Boolen nhoe pth queies n y PsfTwig 0 we enote the lss of Boolen pth-susumptionfee twig queies. The estitions e elxe slightly fo uny twig queies n eflet ou lening lgoithm tht fist infes uny nhoe pth, n next, eotes it with elements of PsfTwig 0 use s filte expessions. Rell tht the seleting pth in uny twig quey is the pth quey on the pth fom the oot noe to the seleting noe. Fomlly, uny twig quey q is pth-susumption-fee iff the uny pth quey fom the oot noe to the seleting noe of q is nhoe n evey Boolen pth quey on the pth ening t (non-seleting) lef noe n eginning t the losest noe on the seleting pth is nhoe. By PsfTwig 1 we enote the lss of uny pth-susumption-fee twig queies. The lsses of pth-susumption-fee twig queies my seem t fist vey limite. We note, howeve, tht twig quey elongs to ou lss if evey lef lel is iffeent o evey pi of leves with the sme lel nnot e ompe with (n ll pths e nhoe). This simple suffiient onition yiels the lge lss of twig queies use in ptie, espeilly if we onsie the following emk. One of the vntges of onsieing n infinite set of lels Σ is the ility to ptue textul vlues (stoe in the leves of tee). Then, non-seleting leves of tee pttens e use fo equlity tests of text vlues, n ely the sme vlue is use to mke n equlity test (on simil pths). Lemm 3.3 Pth-susumption-fee twig queies hve the popeties P 1 n P 2. We point out tht in the poof of Lemm 3.3 we only use the ft tht pth-susumption-fee twig queies e onstute fom nhoe pths. The othe estition, nmely tht the pth queies in Pths(q) nnot susume one nothe, is not use in this poof n it is essentil only fo the pope wok of ou lening lgoithms. Ou eent esults show tht lening is possile without this estition n we inten to pesent these finings in the jounl vesion of the ppe. 4. LEARNING UNARY PATH QUERIES In Figue 7 we pesent lening lgoithm fo the lening setting AnhPth 1 = (Tee 1, AnhPth 1, L 1) inspie y n extening sevel lening lgoithms fo egul sting pttens [2, 37] (f. Setion 9 fo moe etils). Rell tht SelPth(t) stns fo the pth fom the oot noe to the seleting noe of t n exten it to smples SelPth(S) = {SelPth(t) t S}. The lgoithm egins with univesl pth quey // n onsies only the pths fom the oot to the selete noes in the input smple. It onstuts the pth quey in thee stges. lgoithm leneanhpth 1(S) Input: smple S Tee 1 of eote tees Output: miniml p AnhPth 1 suh tht S L 1(p) 1: w := min n (SelPth(S)) 2: let w e of the fom 0/ 1/ / n 3: p := // 4: foeh supth u of 1/ 2/ / n 1 in the oe of eesing lengths o 5: eple in p ny //-ege y //u// s long s S L 1(p) 6: let p e of the fom 0//p 0// 1 7: if S L 1(p{ 0 0}) then 8: p := p{ 0 0} 9: if S L 1(p{ 1 n}) then 10: p := p{ 1 n} 11: foeh esennt ege α in p o 12: fin mximl l s.t. S L 1(p{α //(/) l }) 13: if S L 1(p{α /(/) l }) then 14: p := p{α /(/) l } 15: etun p Figue 7: Lening lgoithm fo AnhPth 1. In the fist stge (lines 4 though 6) the lgoithm ttempts to ientify olletion of ftos, essentilly pth fgments, tht e mutully ommon to evey pth in SelPth(S). Note tht if fto is pesent in evey pth, then it is lso pesent in the n-miniml pth w. The nite quey p is gully efine with the ftos n the invint is tht these ftos e mutully pesent on evey pth in SelPth(S) n in the speifie oe. Fo evey new nite w, leneanhpth 1 ttempts to fin ple whee w n e insete n yiel pth quey p onsistent with S.

In the seon stge (lines 8 though 11), the lgoithm tkes the quey p n ttempts to speilise the fist n the lst ouenes of wil i.e., eple them with the oesponing symol tken fom w. Hee, p{x e} etes opy of p n eples in it the efeene x y expession e (the oiginl p emins unhnge). In the thi stge (lines 12 though 16) the lgoithm ttempts to speilize evey //-ege in p i.e., eple it with mximlly long sequene ///... /. Exmple 3 In this exmple we show the exeution of the lgoithm leneanhpth 1 on the smple {t 1, t 2, t 3} pesente in Figue 8 togethe with pth queies onstute uing the exeution. t 1 : t 2 : t 3 : p 0 : p 1 : p 2 : Figue 8: A smple n the onstute queies. In the fist stge the lgoithm ientifies fto / pesent in evey seleting pth n the esulting pth quey is p 0 = /////. Thee is no othe ommon fto n the lgoithm moves to the seon stge whee it speilizes the oot noe of p 0 otining this wy p 1 = /////; the seleting noe nnot e speilize euse the selete noes of t 1 n t 2 hve two iffeent lels, n esp. Finlly, the lgoithm ttempts to speilize the esening eges. Only the top one n e eple y -pth of length 1, yieling p 2 = /////, whih is lso the finl esult of the lening lgoithm. Thee e spets of the lgoithm tht e not fully speifie e.g., fom two iffeent supths of the sme length whih one shoul e hosen fist in the loop in line 4. We o not enfoe ny ptiul hoie euse it is inessentil fom the theoetil point (sounness n ompleteness) n in ptil implementtions the hoie oul e me with the help of heuistis. Exmple 4 Consie the smple onsisting of the two tees: (((()))) n ((((())))). In the fist stge the lgoithm my ientify eithe fto / o / ut not oth of them. As onsequene the lgoithm my etun one of two possile queies p 1 = //// o p 2 = ////. In oe to mke the lgoithm eteministi we my enfoe some oe of poessing mong nite ftos of the sme length e.g. fom left to ight. We oseve tht S L 1(p) is n invint mintine thoughout leneanhpth 1 n with simple nlysis one n show tht leneanhpth 1 is soun fo AnhPth 1. But wht mkes this lgoithm ptiully inteesting is the following. Lemm 4.1 The lgoithm leneanhpth 1 etuns miniml nhoe pth quey onsistent with the input smple. We pove the lim elow, whih y P 1 fo AnhPth 1 is equivlent to the lemm ove. Clim 2 If leneanhpth 1(S) etuns p, then thee is no uny nhoe pth quey q p suh tht q p n S L 1(q). Poof Suppose othewise n tke uny nhoe pth quey q p hving n emeing λ : p q. We note tht q n e viewe s esult of pplying sustitution θ i.e., q = pθ, whih sustitutes in p some the lels of -noes with lels in Σ n eples some of the //-eges with pth queies. This sustitution n e eompose into omposition θ = θ 1 θ 2... θ k of tomi opetions, whih fo evity we pesent hee s ewiting ules: 1) // / epling //-ege with hil ege, 2) // //B// epling //-ege with lok B (f. poof of Clim 1), 3) hnging the lel of noe to some Σ, 4) // ///... // epling //-ege with -pth. Now, if we tke the pth quey q = pθ 1, then q q p n q p. Consequently, it suffies to ssume tht p is otine fom q y pplying just one tomi sustitution θ. Essentilly, the fist thee types of tomi opetions llow to ientify new fto o longe fto tht woul hve een isovee n popely inopote into the esulting quey uing the exeution of leneanhpth 1(S) in lines 4-6. The lst type, // //... / llows to ientify esening ege tht woul hve een onvete to -pth in lines 13-17. These guments show tht p oul not hve een the esult of leneanhpth 1(S); ontition. We gue tht Lemms 3.1 n 4.1 imply ompleteness of leneanhpth 1 w..t. AnhPth 1. Inee, if CS q S n leneanhpth 1(S) etuns p, then q p euse CS q L 1(p) ut thee is no quey q p tht S L(q ). Hene, q n p e equivlent. Theoem 4.2 Anhoe pth queies e lenle in polynomil time n t fom positive exmples (i.e., in the setting AnhPth 1). 5. LEARNING BOOLEAN PATH QUERIES Lening Boolen pth queies is moe hllenging thn lening uny pth queies. In eote tees, whih e exmples fo lening uny queies, the selete noes unmiguously inite the pth to e mthe y the quey. The exmples fo lening Boolen pth quey e tees with no inition of the pth the onstute quey shoul mth. To ess this polem we evise n lgoithm tht infes onjuntion of Boolen nhoe pth queies tht e stisfie in the given smple. Rell tht AnhPth 0 is the lss of Boolen nhoe pth queies n ConjPth 0 is the lss of eue n he-onsistent onjuntions of Boolen nhoe pth queies (epesente s sets of Boolen pth queies). The oesponing lening settings e AnhPth 0 = (Tee 0, AnhPth 0, L 0) n

ConjPth 0 = (Tee 0, ConjPth 0, L 0), whee L 0 intepets set of pth queies P s the twig quey otine y gluing the oot noes togethe. Figue 9 ontins the lening lgoithms fo ConjPth 0 n AnhPth 0. Fist, we intoue leneanhpth 0, helpe lene eive fom leneanhpth 1, whih infes miniml Boolen nhoe pth quey tht is stisfie y the given pth u n evey tee in the input smple. Note tht to ensue tht the output is Boolen nhoe quey leneanhpth 0 skips the speiliztion of the lst //-ege if oing so woul yiel quey tht is not nhoe (i.e., ening with not peee immeitely y //). The pupose of tking the initil pth u fom the input is the ility to onsie evey pth in S s the wo in whih to seh fo ommon ftos. lgoithm leneanhpth 0(u, S) Input: pth u n smple S Tee 0 of tees Output: miniml p AnhPth 0 s.t. S {u} L 0(p) This lgoithm is otine fom leneanhpth 1 y: initilizing w to u (line 1) epling evey S L 1(p) y S {w} L 0(p) skipping the exeution of loop 13 17 fo the lst //-ege if 1 =. lgoithm leneconjpth 0 (S) Input: smple S Tee 0 of tees Output: set of miniml queies P AnhPth 0 suh tht S L 0(P ) 1: P := 2: fo u Pths(S) o 3: p := leneanhpth 0(u, S) 4: if q P. q p then 5: P := P \ {q P p q} 6: P := P {p} 7: etun P lgoithm leneanhpth 0(S) Input: smple S Tee 0 of tees Output: miniml p AnhPth 0 suh tht S L 0(p) 1: P := leneconjpth 0 (S) 2: hoose ny p fom P 3: etun p Figue 9: Lening ConjPth 0 n AnhPth 0. Essentilly, leneconjpth 0 onsies evey pth u in tee of S n uses leneanhpth 0 to fin most speifi (i.e., miniml) Boolen pth quey p stisfie y u n evey othe element of S. The set P ggegtes ll miniml esults of unning leneanhpth 0 ove ll pths in the input smple. The lening lgoithm leneanhpth 0 simply tkes the esult of leneconjpth 0 n hooses one element. The hoie is ity, ut lte, we show tht in the pesene of the hteisti smple leneconjpth etuns singleton n thee is no miguity. Exmple 5 We un leneconjpth 0 on the smple S 0 (Figue 10) oesponing to the positive exmples fom Exmple 2 simplifie fo lity of pesenttion. offe item fo-sle es item fo-sle es offe list wnte item Figue 10: Input smple fom Exmple 5. The set of pths Pths(S 0) in the smple onsists of: u 1 = offe/item/fo-sle, u 2 = offe/item/es, u 3 = offe/list/item/fo-sle, u 4 = offe/list/item/es, u 5 = offe/list/item/wnte. Running leneanhpth 0 on those pths yiels: es leneanhpth 0(u 1, S 0) = offe//item/fo-sle, leneanhpth 0(u 2, S 0) = offe//item/es, leneanhpth 0(u 3, S 0) = offe//item/fo-sle, leneanhpth 0(u 4, S 0) = offe//item/es, leneanhpth 0(u 5, S 0) = offe//item//. Note tht the esult of leneanhpth 0 on u 5 is the Boolen nhoe quey offe//item// n not the moe speifi offe//item/ euse it is not nhoe; leneanhpth 0 skips the ttempt to speilize the lst //-ege euse it is followe y. The quey offe//item// is, howeve, susume y ll the pevious queies, n theefoe, leneconjpth 0 (S 0) etuns set ontining only the queies offe//item/fo-sle n offe//item/es. The un of leneanhpth 0 on S 0 etuns one of those queies e.g., the one whose sting epesenttion is lexiogphilly miniml offe//item/es. While this is not est hoie fo Exmple 2, the negtive exmples n e use in heuisti to selet quey ejeting the most negtive exmples, in this se offe//item/fo-sle. Beuse leneconjpth 0 (S) etuns set P of Boolen pth queies tht e stisfie in evey tee in S, this lgoithm is soun. Ntully, leneanhpth 0 is lso soun euse it etuns one element of P. To show ompleteness of oth lening lgoithms, we point out n impotnt popety of leneanhpth 0. The onstution of hteisti smples CS P n CS p is in Setion 3.1. Lemm 5.1 Tke onjuntive quey P ConjPth 0, let CS P = {t 0, t 1} e the hteisti smple fo P, n tke ny smple S L 0(P ) ontining two exmples t 0, t 1 suh tht Pths(t i) = Pths(t i) fo i {0, 1}. Then, 1. fo evey u Pths(S), leneanhpth 0(u, S) etuns pth quey equl to o susume y some p P. 2. fo evey p P thee exists u Pths(S) suh tht leneanhpth 0(u, S) etuns p. The ove esult shows ompleteness of leneconjpth 0. As fo leneanhpth 0, if we tke Boolen nhoe pth

quey p n pply the pevious lemm to P = {p}, we get tht fo ny smple S onsistent with p n ontining CS p the lgoithm leneconjpth 0 (S) etuns the singleton {p}, n thus, leneanhpth 0(S) etuns p. This esult llows to pove lenility of oth lsses of queies. p 0 : q 0 : Fusions q 1 : q 2 : q 3 : Theoem 5.2 The quey lsses ConjPth 0 n AnhPth 0 e lenle in polynomil time n t fom positive exmples (i.e., in the settings ConjPth 0 n AnhPth 0 esp.) We lso show minimlity of leneanhpth 0. Lemm 5.3 Fo ny finite S Tee 0, leneconjpth 0 (S) etuns set of miniml Boolen nhoe pth queies onsistent with S n leneanhpth 0(S) etuns miniml Boolen nhoe pth quey onsistent with S. We point out tht while the esult of leneconjpth 0 (S) is set of miniml queies, it is not neessily miniml onjuntive quey i.e., it is not mximl set of miniml queies. In the exmple elow we show tht set of positive exmples my hve n exponentil nume of miniml Boolen pth queies, n theefoe, onstuting thei onjuntion nnot e one in polynomil time. Exmple 6 Fix n > 0 n tke the set of positive exmples S exp of ontining extly two tees t 0 = ( 1( 1(... n( n())...))), t 1 = ( 1( 1(... n( n())...))). Any quey of the fom //β 1//... //β n//, with β i { i, i} n i {1,..., n}, is miniml Boolen pth quey onsistent with S exp. 6. LEARNING BOOLEAN TWIG QUERIES It this setion we investigte lening pth-susumption-fee twig queies fom positive exmples i.e., the lening setting PsfTwig 0 = (Tee 0, PsfTwig 0, L 0). Rell tht q PsfTwig 0 is quey suh tht the set of oot-to-lef pths Pths(q) onsists of Boolen nhoe pth queies n oes not ontin two pth queies suh tht one susumes nothe. Ou ppoh is se on the lgoithm leneconjpth 0, whih infes set P of miniml Boolen pth queies n metho tht llows to eonstut twig quey fom pth queies in P. Intuitively speking, we shll inteleve the pth queies fom P to otin the twig quey. Below, we esie fomlly this tehnique. Given pth quey p n noe n N p, the split of p t n is pi of pth queies p 1 n p 2 suh tht p 1 is the pth fom oot p to n n p 2 is the pth fom n to the only lef of p. Note tht n eomes the oot noe of p 2. A fusion of p into twig quey q is twig quey q suh tht the pi p 1 n p 2 is split of p t n, thee exists n emeing λ : p 1 q, n q is otine fom q y tthing p 2 t noe λ(n) (the noe λ(n) n the oot noe n of p 2 eome the sme noe, the lel of n in p 2 is ignoe). By Fusions(p, q) we enote the set of ll fusions of p into q. Figue 11 pesents ll fusions of /// into [/]///. We point out tht if q is pth-susumption-fee n p is nhoe, then ll elements of Fusions(p, q) e pth-susumption-fee. We note tht Fusions(p, q) my e empty Figue 11: Fusions of p 0 into q 0. e.g., thee is no fusion of / into []/, ut s we gue next, this is neve the se in the lening lgoithm lenepsftwig 0 whih we pesent in Figue 12. We slightly exten the nottion: enotes phntom empty twig quey n Fusions(, p) = {p}. lgoithm lenepsftwig 0 (S) Input: smple S Tee 0 of tees Output: quey p PsfTwig 0 suh tht S L 0(p) 1: q := 2: P := leneconjpth 0 (S) 3: fo p P o 4: C := {q Fusions(p, q) S L 0(q )} 5: q := hoose ny -miniml element of C 6: etun q Figue 12: Lening lgoithm fo PsfTwig 0. Bsilly, lenepsftwig 0 uses leneconjpth 0 to onstut set P of Boolen pth queies stisfie in ll tees of S n then fusions ll the pths into one twig quey. Note tht C is neve empty euse q is uil up fom pth queies in P tht e stisfie in S n hve the sme lel in thei oot noes. Consequently, lenepsftwig 0 exeutes without eos n is soun. The oe in whih lenepsftwig 0 pefoms fusions is ity, ut lte on, we show tht in the pesene of the hteisti smple, the set C hs extly one element t ll times, n the finl esult is the gol quey. Fist, we illustte the wok of lenepsftwig 0 on n exmple. Exmple 7 Consie smple S 1 ontining two DBLP listings in Figue 13: one with olletion of tiles n the othe with olletion of ooks. tile utho title lp tile utho ul title ook lp eito ul title Figue 13: Input smple ook utho title leneconjpth 0 (S 1) etuns the following pth queies: p 1 = lp//utho, p 2 = lp//title, p 3 = lp//ul. We pefom fusions in the oe p 1, p 2, n p 3. Fusing p 1 n p 2 yiels the quey lp/[title]/utho n fusing p 3 into it gives q = lp[/ul]/[title]/utho. Note tht in the lst step, lp/[title][ul]/utho is one of the fusions ut it is not onsistent with the input smple S 1. On the

othe hn, if the oe of fusions is p 2, p 3, n p 1, then the en esult is q = lp[/utho]/[title]/ul. While in the pevious exmple the queies q n q e miniml pth-susumption-fee twig queies onsistent with S 1, in genel lenepsftwig 0 oes not nee to poue suh miniml queies. In ft, we show tht fo etin smples, suh miniml quey my e of exponentil size n thus impossile to onstut y polynomil lgoithm. Exmple 8 (ont Exmple 6) Rell the smple S exp n oseve tht the miniml twig quey onsistent with S exp hs the shpe of pefet iny tee of height n + 1 whee evey noe t epth i {0,..., n 1} hs two hilen lele with i+1 n i+1 (onnete with thei pent with //- ege). Ntully, this miniml quey is pth-susumptionfee. Now, we move to ompleteness of lenepsftwig 0 n we fix quey q PsfTwig 0 n smple S L 0(q). Rell the onstution of the hteisti smple CS q fo q fom Setion 3.1. Fist, we oseve tht fo q PsfTwig 0 evey p Pths(q) is -miniml element of Pths(q). As simple onsequene of Lemm 5.1 we get the following. Lemm 6.1 If S ontins CS q, then leneconjpth 0 (S) etuns Pths(q). To stte tht the lgoithm ppohes the gol quey q with evey fusion, we nee to efine fomlly the seh spe of suqueies of q n show tht when moving with the fusion opeto we neve leve the spe n finlly eh q. A Boolen twig quey q is suquey of q if thee exists suset N of leves of q suh tht q is sugph inue y the set of pths fom the oot of q to the leves in N. The min lim follows. Lemm 6.2 Assume tht CS q S. Fo ny suquey q of q, n ny pth quey p Pths(q) \ Pths(q ) the set of elements of Fusions(p, q ) onsistent with S hs extly one -miniml element q. Futhemoe, q is suquey of q. If CS q S, then y Lemm 6.1 P = Pths(q), n theefoe, whteve is the oe of hoosing pths fom P in line 3, the lgoithm lenepsftwig 0 ppohes q n when ll pths in P e fuse, we otin q. Theoem 6.3 Pth-susumption-fee Boolen twig queies e lenle in polynomil time n t fom positive exmples (i.e., in the setting PsfTwig 0 ). 7. LEARNING UNARY TWIG QUERIES In this setion, we pesent n lgoithm lenepsftwig 1 (Figue 14) fo lening uny pth-susumption-fee twig queies fom positive exmples i.e., in the lening setting PsfTwig 1 = (Tee 1, PsfTwig 1, L 1). Essentilly, the lening lgoithm uses leneanhpth 1 to onstut pth quey p n then it uses lenepsftwig 1, helpe lene eive fom lenepsftwig 0, to eote the noes of the pth quey with filte expessions (Boolen twig queies). Hee, we use the non-evite syntx of XPth lgoithm lenepsftwig 1 (S, q ) Input: smple S Tee 1 of eote tees n quey q PsfTwig 1 suh tht S L 1(q ) Output: quey q PsfTwig 1 s.t. q q n S L 1(q) This lgoithm is otine fom lenepsftwig 0 y: initilizing q to q (line 1) epling evey L 0 y L 1 lgoithm lenepsftwig 1 (S) Input: smple S of eote tees Output: quey q PsfTwig 1 suh tht S L 1(q) 1: p := leneanhpth 1(S) 2: let p e of the fom l 0/α 1::l 1/... /α k ::l k 3: q k := l k 4: fo i = k,..., 0 o 5: S i := 6: fo t S o 7: let n e the eepest noe on the pth fom the oot noe oot t to the selete noe sel t, suh tht n is ehle fom oot t with l 0/α 1::l 1/... /α i::l i n sel t is ehle fom n with q i 8: the sutee of t oote t n to S i 9: q i := lenepsftwig 1 (Si, q i) 10: if i > 0 then 11: q i 1 := l i 1/α i::q i 12: etun q 0 Figue 14: Lening lgoithm fo PsfTwig 1. to epesent the pth quey p s l 0/α 1::l 1/... /α k ::l k, whee l i Σ {} n α i is eithe hil o esennt. When eoting the i-th step of p i.e., the fgment α i::l i, with filte expession, the lgoithm fist onstuts smple S i of sutees tht seve s positive exmples fo lening the oesponing filte expession. Fom evey eote tee in the input smple S one sutee is extte. Eh sutee is oote t noe n on the pth fom the oot noe to the selete noe of the eote tee t. The hoie of n is one so tht it n e ehe with the unpoesse pt of the pth quey l 0/α 1::l 1/... /α i::l i n t the sme time the eote pt of the pth quey q i selets the selete noe sel t when evlute fom n. An impotnt invint of the oute fo loop (lines 4-12) is tht thee is t lest one suh n fo evey t S. If thee is moe thn one possile hoie, the eepest noe is hosen. Exmple 9 Consie smple S 2 (Figue 15) tht ontins the positive exmples oesponing to ( simplifie vesion of) the oument fom Exmple 1. leneanhpth 1(S 2) title pitl liy olletion utho mx liy ook title utho utho mnifesto mx engels Figue 15: Exmples fom liy tse

etuns the quey p = /liy//title. The lgoithm ttempts to speilize the ottom fgment q 2 = title using the two sutees title(pitl) n title(mnifesto). The only Boolen nhoe pth quey these sutees o hve in ommon is title//, whih is fuse into the quey yieling q 2 = title[.//]. Next, the lgoithm moves to q 1 = /title[.//] n lls lenepsftwig 1 with two sutees: one t the noe olletion n one oote t the noe ook. leneconjpth 0 lle with these two tees on input etuns two pth queies /title// n /utho/mx. The fist pth quey is susume y q 1, n theefoe, it is soe y q 1 when fusing. Fusing the seon pth quey into q 1 yiels the quey q 1 = [utho/mx]/title[.//]. Finlly, the lgoithm moves level up to the quey q 0 = q 0 = liy/[utho/mx]/title[.//], whih is lso the en esult of lenepsftwig 1. We oseve tht q 0 n e onsiee s ovespeilize: it ontins the filte expession [.//] whih tests tht the selete title noes hve ontents, test tivilly tue in the pesene of esonle shem infomtion. Cuently, howeve, ou lgoithms o not tke vntge of shem infomtion. The sounness of lenepsftwig 1 follows fom the invint of the min loop (lines 4 12): fo evey t S in line 7 thee is t lest one noe with the esie popety. Completeness of lenepsftwig 1 follows essentilly fom ompleteness of the lgoithms leneanhpth 1 n lenepsftwig 0, n fom the ft tht in line 7 we hose the eepest noe. Theoem 7.1 Pth-susumption-fee uny twig queies e lenle in polynomil time n t fom positive exmples (i.e., in the setting PsfTwig 1 ). 8. IMPACT OF NEGATIVE EXAMPLES In the pevious setions, we onsiee the setting whee the use povies positive exmples only. In this setion, we llow the use to itionlly speify negtive exmples. We use two symols + n to mk whethe n exmple t of some quey is positive one (t, +) o negtive one (t, ). Fomlly, fo i {0, 1} we onsie the following lening settings: Pth ± i = (Tee ± i, Pthi, L± i ) n Twig ± i = (Tee ± i, Twig i, L± i ), whee Tee± i = Tee i {+, } n L ± i (q) = Li(q) {+} (Teei \ Li(q)) { }. We stuy the polem of heking whethe thee even exists quey onsistent with the input smple euse ny soun lening lgoithm nees to etun Null if n only if thee is no suh quey. Fomlly, given lening setting K = (D, C, L), the K-onsisteny is the following eision polem CONS K = {S D q C. S L(q)}. Note tht in the pesene of positive exmples the onsisteny polem is tivil s long s the quey lss ontins the univesl quey //. In the pesene of negtive exmples this polem eomes quite omplex. Theoem 8.1 Twig ± i -onsisteny is NP-omplete fo ny i {0, 1} (even in the pesene of one negtive exmple). Poof We only outline the poof of NP-hness of Twig ± 0 - onsisteny with eution fom SAT. Showing the memeship to NP is moe iffiult, uses nontivil miniml-witness gument, n is omitte. We illustte the eution on n exmple of CNF fomul ϕ 0 = ( x 1 x 2 x 3) (x 1 x 2) fo whih the oesponing smple is pesente in Figue 16 (positive n negtive exmples e inite with the symols + n espetively). The uiling lok of the eution is ush tee whih is use to enoe Boolen vlutions n onstints on them. Fo instne, fo the set of viles {x 1, x 2, x 3} the full ush tee is (x 1(0, 1), x 2(0, 1), x 3(0, 1)) ut typilly we emove some of the leves. Fo instne, the vlution V 0 = {(x 1, flse), (x 2, flse), (x 2, tue)} is epesente y the tee t 0 = (x 1(0), x 2(0), x 3(1)). Note tht the tee ptten (t 0) septes the positive exmples fom the negtive ones in Figue 16 euse V 0 stisfies ϕ 0. The onstute set of exmples onsists of sevel -tees. The positive -tees speify the stisfying vlutions of the input CNF fomul; thee is one -tee pe luse of the input fomul. Eh -tee ontins one ush tee pe litel of the luse, evey ush tee enoing the vlutions tht stisfy the oesponing litel (one lef emove). The negtive -tee ensues tht ush filte tht septes the positive exmples fom negtive is well-fome n enoes vlution. This -tee ontins one ush tee pe vile of the input fomul, evey ush tee hs oth leves of the oesponing vile x i emove. We lim tht this set of exmples is onsistent if n only if the input CNF fomul is stisfile. The if pt is tivil n the poof of the only if pt is tehnil n uses the osevtion tht the epth of ny twig quey septing the positive exmples fom negtive ones is oune y 4. The esult hols even fo vey limite quey lsses tht o not use //-eges n, n in ptiul the esult hol fo pth-susumption-fee twig queies. The polem of onsisteny of the input smple in the pesene of positive n negtive exmples hs lso een onsiee fo sting pttens n foun to e NP-omplete [28]. The poof n e esily pte to show the following. Theoem 8.2 Pth ± i -onsisteny is NP-omplete fo ny i {0, 1}. We emk, howeve, tht the poof nnot e extene to twig queies euse these e muh moe expessive even when intepete ove line tees (wos). Ovell, the negtive esults fo heking onsisteny give us Coolly 8.3 Unless P = NP, none of the lsses Pth i n Twig i fo i {0, 1} is lenle in polynomil time n t in the pesene of positive n negtive exmples. 9. RELATED WORK Ou eseh hees to omputtionl lening theoy [22], nh of mhine lening, n in ptiul, to the e