How To Write A Theory Of The Concept Of The Mind In A Quey

Similar documents
Chapter 3 Savings, Present Value and Ricardian Equivalence

Over-encryption: Management of Access Control Evolution on Outsourced Data

UNIT CIRCLE TRIGONOMETRY

Uncertain Version Control in Open Collaborative Editing of Tree-Structured Documents

An Introduction to Omega

Software Engineering and Development

STUDENT RESPONSE TO ANNUITY FORMULA DERIVATION

The transport performance evaluation system building of logistics enterprises

est using the formula I = Prt, where I is the interest earned, P is the principal, r is the interest rate, and t is the time in years.

2 r2 θ = r2 t. (3.59) The equal area law is the statement that the term in parentheses,

Approximation Algorithms for Data Management in Networks

Questions & Answers Chapter 10 Software Reliability Prediction, Allocation and Demonstration Testing

Chris J. Skinner The probability of identification: applying ideas from forensic statistics to disclosure risk assessment

Week 3-4: Permutations and Combinations

MULTIPLE SOLUTIONS OF THE PRESCRIBED MEAN CURVATURE EQUATION

An Analysis of Manufacturer Benefits under Vendor Managed Systems

An Efficient Group Key Agreement Protocol for Ad hoc Networks

Semipartial (Part) and Partial Correlation

Episode 401: Newton s law of universal gravitation

Coordinate Systems L. M. Kalnins, March 2009

Financing Terms in the EOQ Model

ON THE (Q, R) POLICY IN PRODUCTION-INVENTORY SYSTEMS

Top K Nearest Keyword Search on Large Graphs

Towards Automatic Update of Access Control Policy

Comparing Availability of Various Rack Power Redundancy Configurations

Comparing Availability of Various Rack Power Redundancy Configurations

AN IMPLEMENTATION OF BINARY AND FLOATING POINT CHROMOSOME REPRESENTATION IN GENETIC ALGORITHM

Automatic Testing of Neighbor Discovery Protocol Based on FSM and TTCN*

CONCEPTUAL FRAMEWORK FOR DEVELOPING AND VERIFICATION OF ATTRIBUTION MODELS. ARITHMETIC ATTRIBUTION MODELS

PAN STABILITY TESTING OF DC CIRCUITS USING VARIATIONAL METHODS XVIII - SPETO pod patronatem. Summary

9:6.4 Sample Questions/Requests for Managing Underwriter Candidates

INITIAL MARGIN CALCULATION ON DERIVATIVE MARKETS OPTION VALUATION FORMULAS

2. TRIGONOMETRIC FUNCTIONS OF GENERAL ANGLES

Risk Sensitive Portfolio Management With Cox-Ingersoll-Ross Interest Rates: the HJB Equation

A formalism of ontology to support a software maintenance knowledge-based system

The Role of Gravity in Orbital Motion

Spirotechnics! September 7, Amanda Zeringue, Michael Spannuth and Amanda Zeringue Dierential Geometry Project

METHODOLOGICAL APPROACH TO STRATEGIC PERFORMANCE OPTIMIZATION

On Some Functions Involving the lcm and gcd of Integer Tuples

Review Graph based Online Store Review Spammer Detection

Symmetric polynomials and partitions Eugene Mukhin

Things to Remember. r Complete all of the sections on the Retirement Benefit Options form that apply to your request.

Continuous Compounding and Annualization

Channel selection in e-commerce age: A strategic analysis of co-op advertising models

Skills Needed for Success in Calculus 1

Promised Lead-Time Contracts Under Asymmetric Information

CHAPTER 10 Aggregate Demand I

Reduced Pattern Training Based on Task Decomposition Using Pattern Distributor

HEALTHCARE INTEGRATION BASED ON CLOUD COMPUTING

Database Management Systems

An Epidemic Model of Mobile Phone Virus

How To Find The Optimal Stategy For Buying Life Insuance

Supporting Efficient Top-k Queries in Type-Ahead Search

THE DISTRIBUTED LOCATION RESOLUTION PROBLEM AND ITS EFFICIENT SOLUTION

Efficient Redundancy Techniques for Latency Reduction in Cloud Systems

Carter-Penrose diagrams and black holes

Converting knowledge Into Practice

The Binomial Distribution

YARN PROPERTIES MEASUREMENT: AN OPTICAL APPROACH

Voltage ( = Electric Potential )

Data Center Demand Response: Avoiding the Coincident Peak via Workload Shifting and Local Generation

Vector Calculus: Are you ready? Vectors in 2D and 3D Space: Review

Modeling and Verifying a Price Model for Congestion Control in Computer Networks Using PROMELA/SPIN

Gravitational Mechanics of the Mars-Phobos System: Comparing Methods of Orbital Dynamics Modeling for Exploratory Mission Planning

VISCOSITY OF BIO-DIESEL FUELS

Load Balancing in Processor Sharing Systems

Load Balancing in Processor Sharing Systems

Model-Driven Engineering of Adaptation Engines for Self-Adaptive Software: Executable Runtime Megamodels

Definitions and terminology

Problem Set # 9 Solutions

Tracking/Fusion and Deghosting with Doppler Frequency from Two Passive Acoustic Sensors

Cloud Service Reliability: Modeling and Analysis

A framework for the selection of enterprise resource planning (ERP) system based on fuzzy decision making methods

Financial Derivatives for Computer Network Capacity Markets with Quality-of-Service Guarantees

883 Brochure A5 GENE ss vernis.indd 1-2

Ilona V. Tregub, ScD., Professor

Multicriteria analysis in telecommunications

Physics 235 Chapter 5. Chapter 5 Gravitation

Lab #7: Energy Conservation

MATHEMATICAL SIMULATION OF MASS SPECTRUM

Statistics and Data Analysis

Give me all I pay for Execution Guarantees in Electronic Commerce Payment Processes

Optimizing Content Retrieval Delay for LT-based Distributed Cloud Storage Systems

How to recover your Exchange 2003/2007 mailboxes and s if all you have available are your PRIV1.EDB and PRIV1.STM Information Store database

How to create RAID 1 mirroring with a hard disk that already has data or an operating system on it

Self-Adaptive and Resource-Efficient SLA Enactment for Cloud Computing Infrastructures

An application of stochastic programming in solving capacity allocation and migration planning problem under uncertainty

1240 ev nm 2.5 ev. (4) r 2 or mv 2 = ke2

Figure 2. So it is very likely that the Babylonians attributed 60 units to each side of the hexagon. Its resulting perimeter would then be 360!

Valuation of Floating Rate Bonds 1

Effect of Contention Window on the Performance of IEEE WLANs

Transcription:

Jounal of Atificial Intelligence Reseach 31 (2008) 157-204 Submitted 06/07; published 01/08 Conjunctive Quey Answeing fo the Desciption Logic SHIQ Bite Glimm Ian Hoocks Oxfod Univesity Computing Laboatoy, UK Casten Lutz Desden Univesity of Technology, Gemany Ulike Sattle The Univesity of Mancheste, UK bite.glimm@comlab.ox.ac.uk ian.hoocks@comlab.ox.ac.uk clu@tcs.inf.tu-desden.de sattle@cs.man.ac.uk Abstact Conjunctive queies play an impotant ole as an expessive quey language fo Desciption Logics (DLs). Although moden DLs usually povide fo tansitive oles, conjunctive quey answeing ove DL knowledge bases is only pooly undestood if tansitive oles ae admitted in the quey. In this pape, we conside unions of conjunctive queies ove knowledge bases fomulated in the pominent DL SHIQ and allow tansitive oles in both the quey and the knowledge base. We show decidability of quey answeing in this setting and establish two tight complexity bounds: egading combined complexity, we pove that thee is a deteministic algoithm fo quey answeing that needs time single exponential in the size of the KB and double exponential in the size of the quey, which is optimal. Regading data complexity, we pove containment in co-np. 1. Intoduction Desciption Logics (DLs) ae a family of logic based knowledge epesentation fomalisms (Baade, Calvanese, McGuinness, Nadi, & Patel-Schneide, 2003). Most DLs ae fagments of Fist-Ode Logic esticted to unay and binay pedicates, which ae called concepts and oles in DLs. The constuctos fo building complex expessions ae usually chosen such that the key infeence poblems, such as concept satisfiability, ae decidable and pefeably of low computational complexity. A DL knowledge base (KB) consists of a TBox, which contains intensional knowledge such as concept definitions and geneal backgound knowledge, and an ABox, which contains extensional knowledge and is used to descibe individuals. Using a database metapho, the TBox coesponds to the schema, and the ABox coesponds to the data. In contast to databases, howeve, DL knowledge bases adopt an open wold semantics, i.e., they epesent infomation about the domain in an incomplete way. Standad DL easoning sevices include testing concepts fo satisfiability and etieving cetain instances of a given concept. The latte etieves, fo a knowledge base consisting of an ABox A and a TBox T, all (ABox) individuals that ae instances of the given (possibly complex) concept expession C, i.e., all those individuals a such that T and A entail that a is an instance of C. The undelying easoning poblems ae well-undestood, and it is known that the combined complexity of these easoning poblems, i.e., the complexity measued in the size of the TBox, the ABox, and the quey, is ExpTime-complete fo SHIQ (Tobies, c 2008 AI Access Foundation. All ights eseved.

Glimm, Hoocks, Lutz, & Sattle 2001). The data complexity of a easoning poblem is measued in the size of the ABox only. Wheneve the TBox and the quey ae small compaed to the ABox, as is often the case in pactice, the data complexity gives a moe useful pefomance estimate. Fo SHIQ, instance etieval is known to be data complete fo co-np (Hustadt, Motik, & Sattle, 2005). Despite the high wost case complexity of the standad easoning poblems fo vey expessive DLs such as SHIQ, thee ae highly optimized implementations available, e.g., FaCT++ (Tsakov & Hoocks, 2006), KAON2 1, Pellet (Siin, Pasia, Cuenca Gau, Kalyanpu, & Katz, 2006), and RacePo 2. These systems ae used in a wide ange of applications, e.g., configuation (McGuinness & Wight, 1998), bio infomatics (Wolstencoft, Bass, Hoocks, Lod, Sattle, Tui, & Stevens, 2005), and infomation integation (Calvanese, De Giacomo, Lenzeini, Nadi, & Rosati, 1998b). Most pominently, DLs ae known fo thei use as a logical undepinning of ontology languages, e.g., OIL, DAML+OIL, and OWL (Hoocks, Patel-Schneide, & van Hamelen, 2003), which is a W3C ecommendation (Bechhofe, van Hamelen, Hendle, Hoocks, McGuinness, Patel-Schneide, & Stein, 2004). In data-intensive applications, queying KBs plays a cental ole. Instance etieval is, in some aspects, a athe weak fom of queying: although possibly complex concept expessions ae used as queies, we can only quey fo tee-like elational stuctues, i.e., a DL concept cannot expess abitay cyclic stuctues. This popety is known as the tee model popety and is consideed an impotant eason fo the decidability of most Modal and Desciption Logics (Gädel, 2001; Vadi, 1997). Conjunctive queies (CQs) ae well known in the database community and constitute an expessive quey language with capabilities that go well beyond standad instance etieval. Fo an example, conside a knowledge base that contains an ABox assetion ( hasson.( hasdaughte. ))(May), which infomally states that the individual (o constant in FOL tems) May has a son who has a daughte; hence, that May is a gandmothe. Additionally, we assume that both oles hasson and hasdaughte have a tansitive supe-ole hasdescendant. This implies that May is elated via the ole hasdescendant to he (anonymous) gandchild. Fo this knowledge base, May is clealy an answe to the conjunctive quey hasson(x, y) hasdaughte(y, z) hasdescendant(x, z), when we assume that x is a distinguished vaiable (also called answe o fee vaiable) and y, z ae non-distinguished (existentially quantified) vaiables. If all vaiables in the quey ae non-distinguished, the quey answe is just tue o false and the quey is called a Boolean quey. Given a knowledge base K and a Boolean CQ q, the quey entailment poblem is deciding whethe q is tue o false w..t. K. If a CQ contains distinguished vaiables, the answes to the quey ae those tuples of individual names fo which the knowledge base entails the quey that is obtained by eplacing the fee vaiables with the individual names in the answe tuple. The poblem of finding all answe tuples is known as quey answeing. Since quey entailment is a decision poblem and thus bette suited fo complexity analysis than quey answeing, we concentate on quey entailment. This is no estiction since quey answeing can easily be educed to quey entailment as we illustate in moe detail in Section 2.2. 1. http://kaon2.semanticweb.og 2. http://www.ace-systems.com 158

Conjunctive Quey Answeing fo the DL SHIQ Devising a decision pocedue fo conjunctive quey entailment in expessive DLs such as SHIQ is a challenging poblem, in paticula when tansitive oles ae admitted in the quey (Glimm, Hoocks, & Sattle, 2006). In the confeence vesion of this pape, we pesented the fist decision pocedue fo conjunctive quey entailment in SHIQ. In this pape, we genealize this esult to unions of conjunctive queies (UCQs) ove SHIQ knowledge bases. We achieve this by ewiting a conjunctive quey into a set of conjunctive queies such that each esulting quey is eithe tee-shaped (i.e., it can be expessed as a concept) o gounded (i.e., it contains only constants/individual names and no vaiables). The entailment of both types of queies can be educed to standad easoning poblems (Hoocks & Tessais, 2000; Calvanese, De Giacomo, & Lenzeini, 1998a). The pape is oganized as follows: in Section 2, we give the necessay definitions, followed by a discussion of elated wok in Section 3. In Section 4, we motivate the quey ewiting steps by means of an example. In Section 5, we give fomal definitions fo the ewiting pocedue and show that a Boolean quey is indeed entailed by a knowledge base K iff the disjunction of the ewitten queies is entailed by K. In Section 6, we pesent a deteministic algoithm fo UCQ entailment in SHIQ that uns in time single exponential in the size of the knowledge base and double exponential in the size of the quey. Since the combined complexity of conjunctive quey entailment is aleady 2ExpTime-had fo the DL ALCI (Lutz, 2007), it follows that this poblem is 2ExpTime-complete fo SHIQ. This shows that conjunctive quey entailment fo SHIQ is stictly hade than instance checking, which is also the case fo simple DLs such as EL (Rosati, 2007b). We futhe show that (the decision poblem coesponding to) conjunctive quey answeing in SHIQ is co-npcomplete egading data complexity, and thus not hade than instance etieval. The pesented decision pocedue gives not only insight into quey answeing; it also has an immediate consequence on the field of extending DL knowledge bases with ules. Fom the wok by Rosati (2006a, Thm. 11), the consistency of a SHIQ knowledge base extended with (weakly-safe) Datalog ules is decidable iff the entailment of unions of conjunctive queies in SHIQ is decidable. Hence, we close this open poblem as well. This pape is an extended vesion of the confeence pape: Conjunctive Quey Answeing fo the Desciption Logic SHIQ. Poceedings of the Twentieth Intenational Joint Confeence on Atificial Intelligence (IJCAI 07), Jan 06-12, 2007. 2. Peliminaies We intoduce the basic tems and notations used thoughout the pape. In paticula, we intoduce the DL SHIQ (Hoocks, Sattle, & Tobies, 2000) and (unions of) conjunctive queies. 2.1 Syntax and Semantics of SHIQ Let N C, N R, and N I be countably infinite sets of concept names, ole names, and individual names. We assume that the set of ole names contains a subset N tr N R of tansitive ole names. A ole is an element of N R { N R }, whee oles of the fom ae called invese oles. A ole inclusion is of the fom s with, s oles. A ole hieachy R is a finite set of ole inclusions. 159

Glimm, Hoocks, Lutz, & Sattle An intepetation I = ( I, I) consists of a non-empty set I, the domain of I, and a function I, which maps evey concept name A to a subset A I I, evey ole name N R to a binay elation I I I, evey ole name N tr to a tansitive binay elation I I I, and evey individual name a to an element a I I. An intepetation I satisfies a ole inclusion s if I s I and a ole hieachy R if it satisfies all ole inclusions in R. We use the following standad notation: 1. We define the function Inv ove oles as Inv() := if N R and Inv() := s if = s fo a ole name s. 2. Fo a ole hieachy R, we define * R as the eflexive tansitive closue of ove R {Inv() Inv(s) s R}. We use R s as an abbeviation fo * Rs and s * R. 3. Fo a ole hieachy R and a ole s, we define the set Tans R of tansitive oles as {s thee is a ole with R s and N tr o Inv() N tr }. 4. A ole is called simple w..t. a ole hieachy R if, fo each ole s such that s * R, s / Tans R. The subscipt R of * R and Tans R is dopped if clea fom the context. The set of SHIQconcepts (o concepts fo shot) is the smallest set built inductively fom N C using the following gamma, whee A N C, n IN, is a ole and s is a simple ole: C ::= A C C 1 C 2 C 1 C 2.C.C n s.c n s.c. Given an intepetation I, the semantics of SHIQ-concepts is defined as follows: I = I (C D) I = C I D I ( C) I = I \ C I I = (C D) I = C I D I (.C) I = {d I if (d, d ) I, then d C I } (.C) I = {d I thee is a (d, d ) I with d C I } ( n s.c) I = {d I (s I (d, C)) n} ( n s.c) I = {d I (s I (d, C)) n} whee (M) denotes the cadinality of the set M and s I (d, C) is defined as {d I (d, d ) s I and d C I }. A geneal concept inclusion (GCI) is an expession C D, whee both C and D ae concepts. A finite set of GCIs is called a TBox. An intepetation I satisfies a GCI C D if C I D I, and a TBox T if it satisfies each GCI in T.. An (ABox) assetion is an expession of the fom C(a), (a, b), (a, b), o a = b, whee C is a concept, is a ole, a, b N I. An ABox is a finite set of assetions. We use Inds(A) to denote the set of individual names occuing in A. An intepetation I satisfies an assetion C(a) if a I C I, (a, b) if (a I, b I ) I, (a, b) if (a I, b I ) / I., and a =b if a I b I. An 160

Conjunctive Quey Answeing fo the DL SHIQ intepetation I satisfies an ABox if it satisfies each assetion in A, which we denote with I = A. A knowledge base (KB) is a tiple (T, R, A) with T a TBox, R a ole hieachy, and A an ABox. Let K = (T, R, A) be a KB and I = ( I, I) an intepetation. We say that I satisfies K if I satisfies T, R, and A. In this case, we say that I is a model of K and wite I = K. We say that K is consistent if K has a model. 2.1.1 Extending SHIQ to SHIQ In the following section, we show how we can educe a conjunctive quey to a set of gound o tee-shaped conjunctive queies. Duing the eduction, we may intoduce concepts that contain an intesection of oles unde existential quantification. We define, theefoe, the extension of SHIQ with ole conjunction/intesection, denoted as SHIQ and, in the appendix, we show how to decide the consistency of SHIQ knowledge bases. In addition to the constuctos intoduced fo SHIQ, SHIQ allows fo concepts of the fom C ::= R.C R.C n S.C n S.C, whee R := 1... n, S := s 1... s n, 1,..., n ae oles, and s 1,...,s n ae simple oles. The intepetation function is extended such that ( 1... n ) I = 1 I... n I. 2.2 Conjunctive Queies and Unions of Conjunctive Queies We now intoduce Boolean conjunctive queies since they ae the basic fom of queies we ae concened with. We late also define non-boolean queies and show how they can be educed to Boolean queies. Finally, unions of conjunctive queies ae just a disjunction of conjunctive queies. Fo simplicity, we wite a conjunctive quey as a set instead of as a conjunction of atoms. Fo example, we wite the intoductoy example fom Section 1 as {hasson(x, y), hasdaughte(y, z), hasdescendant(x, z)}. Fo non-boolean queies, i.e., when we conside the poblem of quey answeing, the answe vaiables ae often given in the head of the quey, e.g., (x 1, x 2, x 3 ) {hasson(x 1, x 2 ), hasdaughte(x 2, x 3 ), hasdescendant(x 1, x 3 )} indicates that the quey answes ae those tuples (a 1, a 2, a 3 ) of individual names that, substituted fo x 1, x 2, and x 3 espectively, esult in a Boolean quey that is entailed by the knowledge base. Fo simplicity and since we mainly focus on quey entailment, we do not use a quey head even in the case of a non-boolean quey. Instead, we explicitly say which vaiables ae answe vaiables and which ones ae existentially quantified. We now give a definition of Boolean conjunctive queies. Definition 1. Let N V be a countably infinite set of vaiables disjoint fom N C, N R, and N I. A tem t is an element fom N V N I. Let C be a concept, a ole, and t, t tems. An atom is an expession C(t), (t, t ), o t t and we efe to these thee diffeent types of atoms as concept atoms, ole atoms, and equality atoms espectively. A Boolean conjunctive quey 161

Glimm, Hoocks, Lutz, & Sattle q is a non-empty set of atoms. We use Vas(q) to denote the set of (existentially quantified) vaiables occuing in q, Inds(q) to denote the set of individual names occuing in q, and Tems(q) fo the set of tems in q, whee Tems(q) = Vas(q) Inds(q). If all tems in q ae individual names, we say that q is gound. A sub-quey of q is simply a subset of q (including q itself). As usual, we use (q) to denote the cadinality of q, which is simply the numbe of atoms in q, and we use q fo the size of q, i.e., the numbe of symbols necessay to wite q. A SHIQ conjunctive quey is a conjunctive quey in which all concepts C that occu in a concept atom C(t) ae SHIQ-concepts. Since equality is eflexive, symmetic and tansitive, we define * as the tansitive, eflexive, and symmetic closue of ove the tems in q. Hence, the elation * is an equivalence elation ove the tems in q and, fo t Tems(q), we use [t] to denote the equivalence class of t by *. Let I = ( I, I) be an intepetation. A total function π: Tems(q) I is an evaluation if (i) π(a) = a I fo each individual name a Inds(q) and (ii) π(t) = π(t ) fo all t * t. We wite I = π C(t) if π(t) C I ; I = π (t, t ) if (π(t), π(t )) I ; I = π t t if π(t) = π(t ). If, fo an evaluation π, I = π at fo all atoms at q, we wite I = π q. We say that I satisfies q and wite I = q if thee exists an evaluation π such that I = π q. We call such a π a match fo q in I. Let K be a SHIQ knowledge base and q a conjunctive quey. If I = K implies I = q, we say that K entails q and wite K = q. The quey entailment poblem is defined as follows: given a knowledge base K and a quey q, decide whethe K = q. Fo bevity and simplicity of notation, we define the elation ove atoms in q as follows: C(t) q if thee is a tem t Tems(q) such that t * t and C(t ) q, and (t 1, t 2 ) q if thee ae tems t 1, t 2 Tems(q) such that t 1 * t 1, t 2 * t 2, and (t 1, t 2 ) q o Inv()(t 2, t 1 ) q. This is clealy justified by definition of the semantics, in paticula, because I = (t, t ) implies that I = Inv()(t, t). When devising a decision pocedue fo CQ entailment, most complications aise fom cyclic queies (Calvanese et al., 1998a; Chekui & Rajaaman, 1997). In this context, when we say cyclic, we mean that the gaph stuctue induced by the quey is cyclic, i.e., the gaph obtained fom q such that each tem is consideed as a node and each ole atom induces an edge. Since, in the pesence of invese oles, a quey containing the ole atom (t, t ) is equivalent to the quey obtained by eplacing this atom with Inv()(t, t), the diection of the edges is not impotant and we say that a quey is cyclic if its undelying undiected gaph stuctue is cyclic. Please note also that multiple ole atoms fo two tems ae not consideed as a cycle, e.g., the quey {(t, t ), s(t, t )} is not a cyclic quey. The following is a moe fomal definition of this popety. Definition 2. A quey q is cyclic if thee exists a sequence of tems t 1,...,t n with n > 3 such that 162

Conjunctive Quey Answeing fo the DL SHIQ 1. fo each i with 1 i < n, thee exists a ole atom i (t i, t i+1 ) q, 2. t 1 = t n, and 3. t i t j fo 1 i < j < n. In the above definition, Item 3 makes sue that we do not conside queies as cyclic just because they contain two tems t, t fo which thee ae moe than two ole atoms using the two tems. Please note that we use the elation hee, which implicitly uses the elation * and abstacts fom the diectedness of ole atoms. In the following, if we wite that we eplace (t, t ) q with s(t 1, t 2 ),...,s(t n 1, t n ) fo t = t 1 and t = t n, we mean that we fist emove any occuences of (ˆt, ˆt ) and Inv()(ˆt, ˆt) such that ˆt * t and ˆt * t fom q, and then add the atoms s(t 1, t 2 ),...,s(t n 1, t n ) to q. W.l.o.g., we assume that queies ae connected. Moe pecisely, let q be a conjunctive quey. We say that q is connected if, fo all t, t Tems(q), thee exists a sequence t 1,...,t n such that t 1 = t, t n = t and, fo all 1 i < n, thee exists a ole such that (t i, t i+1 ) q. A collection q 1,...,q n of queies is a patitioning of q if q = q 1... q n, q i q j = fo 1 i < j n, and each q i is connected. Lemma 3. Let K be a knowledge base, q a conjunctive quey, and q 1,...,q n a patitioning of q. Then K = q iff K = q i fo each i with 1 i n. A poof is given by Tessais (2001, 7.3.2) and, with this lemma, it is clea that the estiction to connected queies is indeed w.l.o.g. since entailment of q can be decided by checking entailment of each q i at a time. In what follows, we theefoe assume queies to be connected without futhe notice. Definition 4. A union of Boolean conjunctive queies is a fomula q 1... q n, whee each disjunct q i is a Boolean conjunctive quey. A knowledge base K entails a union of Boolean conjunctive queies q 1... q n, witten as K = q 1... q n, if, fo each intepetation I such that I = K, thee is some i such that I = q i and 1 i n. W.l.o.g. we assume that the vaiable names in each disjunct ae diffeent fom the vaiable names in the othe disjuncts. This can always be achieved by naming vaiables apat. We futhe assume that each disjunct is a connected conjunctive quey. This is w.l.o.g. since a UCQ which contains unconnected disjuncts can always be tansfomed into conjunctive nomal fom; we can then decide entailment fo each esulting conjunct sepaately and each conjunct is a union of connected conjunctive queies. We descibe this tansfomation now in moe detail and, fo a moe convenient notation, we wite a conjunctive quey {at 1,...,at k } as at 1... at k in the following poof, instead of the usual set notation. Lemma 5. Let K be a knowledge base, q = q 1... q n a union of conjunctive queies such that, fo 1 i n, qi 1,...,qk i i is a patitioning of the conjunctive quey q i. Then K = q iff K = (q i 1 1... qn in ). (i 1,...,i n) {1,...,k 1 }... {1,...,k n} 163

Glimm, Hoocks, Lutz, & Sattle Again, a detailed poof is given by Tessais (2001, 7.3.3). Please note that, due to the tansfomation into conjunctive nomal fom, the esulting numbe of unions of connected conjunctive queies fo which we have to test entailment can be exponential in the size of the oiginal quey. When analysing the complexity of the decision pocedues pesented in Section 6, we show that the assumption that each CQ in a UCQ is connected does not incease the complexity. We now make the connection between quey entailment and quey answeing cleae. Fo quey answeing, let the vaiables of a conjunctive quey be typed: each vaiable can eithe be existentially quantified (also called non-distinguished) o fee (also called distinguished o answe vaiables). Let q be a quey in n vaiables (i.e., (Vas(q)) = n), of which v 1,...,v m (m n) ae answe vaiables. The answes of K = (T, R, A) to q ae those m-tuples (a 1,...,a m ) Inds(A) m such that, fo all models I of K, I = π q fo some π that satisfies π(v i ) = a i I fo all i with 1 i m. It is not had to see that the answes of K to q can be computed by testing, fo each (a 1,...,a m ) Inds(A) m, whethe the quey q [v1,...,v m/a 1,...,a m] obtained fom q by eplacing each occuence of v i with a i fo 1 i m is entailed by K. The answe to q is then the set of all m-tuples (a 1,...,a m ) fo which K = q [v1,...,v m/a 1,...,a m]. Let k = (Inds(A)) be the numbe of individual names used in the ABox A. Since A is finite, clealy k is finite. Hence, deciding which tuples belong to the set of answes can be checked with at most k m entailment tests. This is clealy not vey efficient, but optimizations can be used, e.g., to identify a (hopefully small) set of candidate tuples. The algoithm that we pesent in Section 6 decides quey entailment. The easons fo devising a decision pocedue fo quey entailment instead of quey answeing ae twofold: fist, quey answeing can be educed to quey entailment as shown above; second, in contast to quey answeing, quey entailment is a decision poblem and can be studied in tems of complexity theoy. In the emainde of this pape, if not stated othewise, we use q (possibly with subscipts) fo a connected Boolean conjunctive quey, K fo a SHIQ knowledge base (T, R, A), I fo an intepetation ( I, I), and π fo an evaluation. 3. Related Wok Vey ecently, an automata-based decision pocedue fo positive existential path queies ove ALCQIb eg knowledge bases has been pesented (Calvanese, Eite, & Otiz, 2007). Positive existential path queies genealize unions of conjunctive queies and since a SHIQ knowledge base can be polynomially educed to an ALCQIb eg knowledge base, the pesented algoithm is a decision pocedue fo (union of) conjunctive quey entailment in SHIQ as well. The automata-based technique can be consideed moe elegant than ou ewiting algoithm, but it does not give an NP uppe bound fo the data complexity as ou technique. Most existing algoithms fo conjunctive quey answeing in expessive DLs assume, howeve, that ole atoms in conjunctive queies use only oles that ae not tansitive. As a consequence, the example quey fom the intoductoy section cannot be answeed. Unde this estiction, decision pocedues fo vaious DLs aound SHIQ ae known (Hoocks & Tessais, 2000; Otiz, Calvanese, & Eite, 2006b), and it is known that answeing conjunctive queies in this setting is data complete fo co-np (Otiz et al., 2006b). Anothe common 164

Conjunctive Quey Answeing fo the DL SHIQ estiction is that only individuals named in the ABox ae consideed fo the assignments of vaiables. In this setting, the semantics of queies is no longe the standad Fist-Ode one. With this estiction, the answe to the example quey fom the intoduction would be false since May is the only named individual. It is not had to see that conjunctive quey answeing with this estiction can be educed to standad instance etieval by eplacing the vaiables with individual names fom the ABox and then testing the entailment of each conjunct sepaately. Most of the implemented DL easones, e.g., KAON2, Pellet, and RacePo, povide an inteface fo conjunctive quey answeing in this setting and employ seveal optimizations to impove the pefomance (Siin & Pasia, 2006; Motik, Sattle, & Stude, 2004; Wessel & Mölle, 2005). Pellet appeas to be the only easone that also suppots the standad Fist-Ode semantics fo SHIQ conjunctive queies unde the estiction that the queies ae acyclic. To the best of ou knowledge, it is still an open poblem whethe conjunctive quey entailment is decidable in SHOIQ. Regading undecidability esults, it is known that conjunctive quey entailment in the two vaiable fagment of Fist-Ode Logic L 2 is undecidable (Rosati, 2007a) and Rosati identifies a elatively small set of constuctos that causes the undecidability. Quey entailment and answeing have also been studied in the context of databases with incomplete infomation (Rosati, 2006b; van de Meyden, 1998; Gahne, 1991). In this setting, DLs can be used as schema languages, but the expessivity of the consideed DLs is much lowe than the expessivity of SHIQ. Fo example, the constuctos povided by logics of the DL-Lite family (Calvanese, De Giacomo, Lembo, Lenzeini, & Rosati, 2007) ae chosen such that the standad easoning tasks ae in PTime and quey entailment is in LogSpace with espect to data complexity. Futhemoe, TBox easoning can be done independently of the ABox and the ABox can be stoed and accessed using a standad database SQL engine. Since the consideed DLs ae consideable less expessive than SHIQ, the techniques used in databases with incomplete infomation cannot be applied in ou setting. Regading the quey language, it is well known that an extension of conjunctive queies with inequalities is undecidable (Calvanese et al., 1998a). Recently, it has futhe been shown that even fo DLs with low expessivity, an extension of conjunctive queies with inequalities o safe ole negation leads to undecidability (Rosati, 2007a). A elated easoning poblem is quey containment. Given a schema (o TBox) S and two queies q and q, we have that q is contained in q w..t. S iff evey intepetation I that satisfies S and q also satisfies q. It is well known that quey containment w..t. a TBox can be educed to deciding quey entailment fo (unions of) conjunctive queies w..t. a knowledge base (Calvanese et al., 1998a). Hence a decision pocedue fo (unions of) conjunctive queies in SHIQ can also be used fo deciding quey containment w..t. to a SHIQ TBox. Entailment of unions of conjunctive queies is also closely elated to the poblem of adding ules to a DL knowledge base, e.g., in the fom of Datalog ules. Augmenting a DL KB with an abitay Datalog pogam easily leads to undecidability (Levy & Rousset, 1998). In ode to ensue decidability, the inteaction between the Datalog ules and the DL knowledge base is usually esticted by imposing a safeness condition. The DL+log famewok (Rosati, 2006a) povides the least estictive integation poposed so fa. Rosati 165

Glimm, Hoocks, Lutz, & Sattle pesents an algoithm that decides the consistency of a DL+log knowledge base by educing the poblem to entailment of unions of conjunctive queies, and he poves that decidability of UCQs in SHIQ implies the decidability of consistency fo SHIQ+log knowledge bases. 4. Quey Rewiting by Example In this section, we motivate the ideas behind ou quey ewiting technique by means of examples. In the following section, we give pecise definitions fo all ewiting steps. 4.1 Foest Bases and Canonical Intepetations The main idea is that we can focus on models of the knowledge base that have a kind of tee o foest shape. It is well known that one eason fo Desciption and Modal Logics being so obustly decidable is that they enjoy some fom of tee model popety, i.e., evey satisfiable concept has a model that is tee-shaped (Vadi, 1997; Gädel, 2001). When going fom concept satisfiability to knowledge base consistency, we need to eplace the tee model popety with a fom of foest model popety, i.e., evey consistent KB has a model that consists of a set of tees, whee each oot coesponds to a named individual in the ABox. The oots can be connected via abitay elational stuctues, induced by the ole assetions given in the ABox. A foest model is, theefoe, not a foest in the gaph theoetic sense. Futhemoe, tansitive oles can intoduce shot-cut edges between elements within a tee o even between elements of diffeent tees. Hence we talk of a fom of foest model popety. We now define foest models and show that, fo deciding quey entailment, we can estict ou attention to foest models. The ewiting steps ae then used to tansfom cyclic subpats of the quey into tee-shaped ones such that thee is a foest-shaped match fo the ewitten quey into the foest models. In ode to make the foest model popety even cleae, we also intoduce foest bases, which ae intepetations that intepet tansitive oles in an unesticted way, i.e., not necessaily in a tansitive way. Fo a foest base, we equie in paticula that all elationships between elements of the domain that can be infeed by tansitively closing a ole ae omitted. In the following, we assume that the ABox contains at least one individual name, i.e., Inds(A) is non-empty. This is w.l.o.g. since we can always add an assetion (a) to the ABox fo a fesh individual name a N I. Fo eades familia with tableau algoithms, it is woth noting that foest bases can also be thought of as those tableaux geneated fom a complete and clash-fee completion tee (Hoocks et al., 2000). Definition 6. Let IN denote the non-negative integes and IN the set of all (finite) wods ove the alphabet IN. A tee T is a non-empty, pefix-closed subset of IN. Fo w, w T, we call w a successo of w if w = w c fo some c IN, whee denotes concatenation. We call w a neighbo of w if w is a successo of w o vice vesa. The empty wod ε is called the oot. A foest base fo K is an intepetation J = ( J, J ) that intepets tansitive oles in an unesticted (i.e., not necessaily tansitive) way and, additionally, satisfies the following conditions: T1 J Inds(A) IN such that, fo all a Inds(A), the set {w (a, w) J } is a tee; 166

Conjunctive Quey Answeing fo the DL SHIQ T2 if ((a, w), (a, w )) J, then eithe w = w = ε o a = a and w is a neighbo of w; T3 fo each a Inds(A), a J = (a, ε); An intepetation I is canonical fo K if thee exists a foest base J fo K such that I is identical to J except that, fo all non-simple oles, we have I = J s * R, s Tans R (s J ) + In this case, we say that J is a foest base fo I and if I = K we say that I is a canonical model fo K. Fo convenience, we extend the notion of successos and neighbos to elements in canonical models. Let I be a canonical model with (a, w), (a, w ) I. We call (a, w ) a successo of (a, w) if eithe a = a and w = w c fo some c IN o w = w = ε. We call (a, w ) a neighbo of (a, w) if (a, w ) is a successo of (a, w) o vice vesa. Please note that the above definition implicitly elies on the unique name assumption (UNA) (cf. T3). This is w.l.o.g. as we can guess an appopiate patition among the individual names and eplace the individual names in each patition with one epesentative individual name fom that patition. In Section 6, we show how the patitioning of individual names can be used to simulate the UNA, hence, ou decision pocedue does not ely on the UNA. We also show that this does not affect the complexity. Lemma 7. Let K be a SHIQ knowledge base and q = q 1... q n a union of conjunctive queies. Then K = q iff thee exists a canonical model I of K such that I = q. A detailed poof is given in the appendix. Infomally, fo the only if diection, we can take an abitay counte-model fo the quey, which exists by assumption, and unavel all non-tee stuctues. Since, duing the unaveling pocess, we only eplace cycles in the model by infinite paths and leave the intepetation of concepts unchanged, the quey is still not satisfied in the unavelled canonical model. The if diection of the poof is tivial. 4.2 The Running Example We use the following Boolean quey and knowledge base as a unning example: Example 8. Let K = (T, R, A) be a SHIQ knowledge base with, t N tr, k IN T = { } R = { t t, s } A = { } C k k p., C 3 3 p., D 2 s. t. (a, b), ( p.c k p.c.c 3 )(a), ( p.d 1.D 2 )(b) and q = {(u, x), (x, y), t(y, y), s(z, y), (u, z)} with Inds(q) = and Vas(q) = {u, x, y, z}. 167

Glimm, Hoocks, Lutz, & Sattle Fo simplicity, we choose to use a CQ instead of a UCQ. In case of a UCQ, the ewiting steps ae applied to each disjunct sepaately. p (a,1) (a,11) (a,12)... (a,1k) p C k p (a,ε) (b,ε) p p p t,t (a,2) C (a,3) C 3 D 1 (b,1) D 2 (b,2) p p p,s t,t t,t (a, 31) (a, 32) (a, 33) Figue 1: A epesentation of a canonical intepetation I fo K. (b, 21) (b, 22) Figue 1 shows a epesentation of a canonical model I fo the knowledge base K fom Example 8. Each labeled node epesents an element in the domain, e.g., the individual name a is epesented by the node labeled (a, ε). The edges epesent elationships between individuals. Fo example, we can ead the -labeled edge fom (a, ε) to (b, ε) in both diections, i.e., (a I, b I ) = ((a, ε), (b, ε)) I and (b I, a I ) = ((b, ε), (a, ε)) I. The shot-cuts due to tansitive oles ae shown as dashed lines, while the elationship between the nodes that epesent ABox individuals is shown in gey. Please note that we did not indicate the intepetations of all concepts in the figue. Since I is a canonical model fo K, the elements of the domain ae pais (a, w), whee a indicates the individual name that coesponds to the oot of the tee, i.e., a I = (a, ε) and the elements in the second place fom a tee accoding to ou definition of tees. Fo each individual name a in ou ABox, we can, theefoe, easily define the tee ooted in a as {w (a, w) I }. (a,ε) p p (a,1) (a,2) (a,3) (b,1) (b,2) p (b,ε) p p p p p p,s t,t (a, 11) (a, 12)... (a, 1k) (a, 31) (a, 32) (a, 33) (b, 21) (b, 22) Figue 2: A foest base fo the intepetation epesented by Figue 1. Figue 2 shows a epesentation of a foest base fo the intepetation fom Figue 1 above. Fo simplicity, the intepetation of concepts is no longe shown. The two tees, ooted in (a, ε) and (b, ε) espectively, ae now clea. A gaphical epesentation of the quey q fom Example 8 is shown in Figue 3, whee the meaning of the nodes and edges is analogous to the ones given fo intepetations. We call this quey a cyclic quey since its undelying undiected gaph is cyclic (cf. Definition 2). Figue 4 shows a match π fo q and I and, although we conside only one canonical model hee, it is not had to see that the quey is tue in each model of the knowledge base, i.e., K = q. 168

Conjunctive Quey Answeing fo the DL SHIQ x u s t y Figue 3: A gaph epesentation of the quey fom Example 8. z (a,1) (a,ε) (a,2) (a,3) u (b,1) x (b,ε) t,t y (b,2),s t,t t,t (a, 11) (a,12)... (a,1k) (a,31) (a,32) (a,33) (b, 21) z (b, 22) Figue 4: A match π fo the quey q fom Example 8 onto the model I fom Figue 1. The foest model popety is also exploited in the quey ewiting pocess. We want to ewite q into a set of queies q 1,...,q n of gound o tee-shaped queies such that K = q iff K = q 1... q n. Since the esulting queies ae gound o tee-shaped queies, we can exploe the known techniques fo deciding entailment of these queies. As a fist step, we tansfom q into a set of foest-shaped queies. Intuitively, foest-shaped queies consist of a set of tee-shaped sub-queies, whee the oots of these tees might be abitaily inteconnected (by atoms of the fom (t, t )). A tee-shaped quey is a special case of a foest-shaped quey. We will call the abitaily inteconnected tems of a foest-shaped quey the oot choice (o, fo shot, just oots). At the end of the ewiting pocess, we eplace the oots with individual names fom Inds(A) and tansfom the tee pats into a concept by applying the so called olling-up o tuple gaph technique (Tessais, 2001; Calvanese et al., 1998a). In the poof of the coectness of ou pocedue, we use the stuctue of the foest bases in ode to explicate the tansitive shot-cuts used in the quey match. By explicating we mean that we eplace each ole atom that is mapped to such a shot-cut with a sequence of ole atoms such that an extended match fo the modified quey uses only paths that ae in the foest base. 4.3 The Rewiting Steps The ewiting pocess fo a quey q is a six stage pocess. At the end of this pocess, the ewitten quey may o may not be in a foest shape. As we show late, this don t know non-deteminism does not compomise the coectness of the algoithm. In the fist stage, we deive a collapsing q co of q by adding (possibly seveal) equality atoms to q. Conside, 169

Glimm, Hoocks, Lutz, & Sattle fo example, the cyclic quey q = {(x, y), (x, y ), s(y, z), s(y, z)} (see Figue 5), which can be tansfomed into a tee-shaped one by adding the equality atom y y. x x y s s y y,y s z z Figue 5: A epesentation of a cyclic quey and of the tee-shaped quey obtained by adding the atom y y to the quey depicted on the left hand side. A common popety of the next thee ewiting steps is that they allow fo substituting the implicit shot-cut edges with explicit paths that induce the shot-cut. The thee steps aim at diffeent cases in which these shot-cuts can occu and we descibe thei goals and application now in moe detail: The second stage is called split ewiting. In a split ewiting we take cae of all ole atoms that ae matched to tansitive shot-cuts connecting elements of two diffeent tees and by-passing one o both of thei oots. We substitute these shot-cuts with eithe one o two ole atoms such that the oots ae included. In ou unning example, π maps u to (a,3) and x to (b, ε). Hence I = π (u, x), but the used -edge is a tansitive shot-cut connecting the tee ooted in a with the tee ooted in b, and by-passing (a, ε). Simila aguments hold fo the atom (u, z), whee the path that implies this shot-cut elationship goes via the two oots (a, ε) and (b, ε). It is clea that must be a non-simple ole since, in the foest base J fo I, thee is no diect connection between diffeent tees othe than between the oots of the tees. Hence, (π(u), π(x)) I holds only because thee is a ole s Tans R such that s * R. In case of ou example, itself is tansitive. A split ewiting eliminates tansitive shot-cuts between diffeent tees of a canonical model and adds the missing vaiables and ole atoms matching the sequence of edges that induce the shot-cut. ux u x s y t Figue 6: A split ewiting q s fo the quey shown in Figue 3. Figue 6 depicts the split ewiting q s = { (u, ux), (ux, x), (x, y), t(y, y), s(z, y), (u, ux), (ux, x), (x, z)} z 170

Conjunctive Quey Answeing fo the DL SHIQ of q that is obtained fom q by eplacing (i) (u, x) with (u, ux) and (ux, x) and (ii) (u, z) with (u, ux), (ux, x), and (x, z). Please note that we both intoduced a new vaiable (ux) and e-used an existing vaiable (x). Figue 7 shows a match fo q s and the canonical model I of K in which the two tees ae only connected via the oots. Fo the ewitten quey, we also guess a set of oots, which contains the vaiables that ae mapped to the oots in the canonical model. Fo ou unning example, we guess that the set of oots is {ux, x}. ux (a,ε) x (b,ε) t,t (a,1) (a,2) (a,3) u (b,1) y (b,2) s, (a,11) (a,12)... (a,1k) (a, 31) (a, 32) (a, 33) (b, 21) z (b, 22) Figue 7: A split match π s fo the quey q s fom Figue 6 onto the canonical intepetation fom Figue 1. In the thid step, called loop ewiting, we eliminate loops fo vaiables v that do not coespond to oots by eplacing atoms (v, v) with two atom (v, v ) and (v, v), whee v can eithe be a new o an existing vaiable in q. In ou unning example, we eliminate the loop t(y, y) as follows: q l = { (u, ux), (ux, x), (x, y), t(y, y ), t(y, y), s(z, y), (u, ux), (ux, x), (x, z)} is the quey obtained fom q s (see Figue 6) by eplacing t(y, y) with t(y, y ) and t(y, y) fo a new vaiable y. Please note that, since t is defined as tansitive and symmetic, t(y, y) is still implied, i.e., the loop is also a tansitive shot-cut. Figue 8 shows the canonical intepetation I fom Figue 1 with a match π l fo q l. The intoduction of the new vaiable y is needed in this case since thee is no vaiable that could be e-used and the individual (b, 22) is not in the ange of the match π s. ux (a,ε) x (b,ε) (a, 11) (a,1) (a,12)... (a,1k) (a,2) (a,3) u (a, 31) (a, 32) (a, 33) (b,1) y (b,2) s, t,t (b, 21) z y (b,22) Figue 8: A loop ewiting q l and a match fo the canonical intepetation fom Figue 1. The foth ewiting step, called foest ewiting, allows again the eplacement of ole atoms with sets of ole atoms. This allows the elimination of cycles that ae within a single 171

Glimm, Hoocks, Lutz, & Sattle tee. A foest ewiting q f fo ou example can be obtained fom q l by eplacing the ole atom (x, z) with (x, y) and (y, z), esulting in the quey q f = { (u, ux), (ux, x), (x, y), t(y, y ), t(y, y), s(z, y), (u, ux), (ux, x), (x, y), (y, z)}. Clealy, this esults in tee-shaped sub-queies, one ooted in ux and one ooted in x. Hence q f is foest-shaped w..t. the oot tems ux and x. Figue 9 shows the canonical intepetation I fom Figue 1 with a match π f fo q f. ux (a,ε) x (b,ε) (a, 11) (a, 12) (a,1) (a,2) (a,3) u (b,1) y (b,2),s t,t... (a,1k) (a, 31) (a, 32) (a, 33) z (b,21) y (b,22) Figue 9: A foest ewiting q f and a foest match π f fo the canonical intepetation fom Figue 1. In the fifth step, we use the standad olling-up technique (Hoocks & Tessais, 2000; Calvanese et al., 1998a) and expess the tee-shaped sub-queies as concepts. In ode to do this, we tavese each tee in a bottom-up fashion and eplace each leaf (labeled with a concept C, say) and its incoming edge (labeled with a ole, say) with the concept.c added to its pedecesso. Fo example, the tee ooted in ux (i.e., the ole atom (u, ux)) can be eplaced with the atom (. )(ux). Similaly, the tee ooted in x (i.e., the ole atoms (x, y), (y, z), s(z, y), t(y, y ), and t(y, y)) can be eplaced with the atom (.(( ( Inv(s)). ) ( (t Inv(t)). ))(x). Please note that we have to use ole conjunctions in the esulting quey in ode to captue the semantics of multiple ole atoms elating the same pai of vaiables. Recall that, in the split ewiting, we have guessed that x and ux coespond to oots and, theefoe, coespond to individual names in Inds(A). In the sixth and last ewiting step, we guess which vaiable coesponds to which individual name and eplace the vaiables with the guessed names. A possible guess fo ou unning example would be that ux coesponds to a and x to b. This esults in the (gound) quey {(. )(a), (a, b), (.(( ( Inv(s)). ) ( (t Inv(t)). )))(b)}, which is entailed by K. Please note that we focused in the unning example on the most easonable ewiting. Thee ae seveal othe possible ewitings, e.g., we obtain anothe ewiting fom q f by eplacing ux with b and x with a in the last step. Fo a UCQ, we apply the ewiting steps to each of the disjuncts sepaately. 172

Conjunctive Quey Answeing fo the DL SHIQ At the end of the ewiting pocess, we have, fo each disjunct, a set of gound queies and/o queies that wee olled-up into a single concept atom. The latte queies esult fom foest ewitings that ae tee-shaped and have an empty set of oots. Such tee-shaped ewitings can match anywhee in a tee and can, thus, not be gounded. Finally, we check if ou knowledge base entails the disjunction of all the ewitten queies. We show that thee is a bound on the numbe of (foest-shaped) ewitings and hence on the numbe of queies poduced in the ewiting pocess. Summing up, the ewiting pocess fo a connected conjunctive quey q involves the following steps: 1. Build all collapsings of q. 2. Build all split ewitings of each collapsing w..t. a subset R of oots. 3. Build all loop ewitings of the split ewitings. 4. Build all (foest-shaped) foest ewitings of the loop ewitings. 5. Roll up each tee-shaped sub-quey in a foest-ewiting into a concept atom and 6. eplace the oots in R with individual names fom the ABox in all possible ways. Let q 1,...,q n be the queies esulting fom the ewiting pocess. In the next section, we define each ewiting step and pove that K = q iff K = q 1 q n. Checking entailment fo the ewitten queies can easily be educed to KB consistency and any decision pocedue fo SHIQ KB consistency could be used in ode to decide if K = q. We pesent one such decision pocedue in Section 6. 5. Quey Rewiting In the pevious section, we have used seveal tems, e.g., tee- o foest-shaped quey, athe infomally. In the following, we give definitions fo the tems used in the quey ewiting pocess. Once this is done, we fomalize the quey ewiting steps and pove the coectness of the pocedue, i.e., we show that the foest-shaped queies obtained in the ewiting pocess can indeed be used fo deciding whethe a knowledge base entails the oiginal quey. We do not give the detailed poofs hee, but athe some intuitions behind the poofs. Poofs in full detail ae given in the appendix. 5.1 Tee- and Foest-Shaped Queies In ode to define tee- o foest-shaped queies moe pecisely, we use mappings between queies and tees o foests. Instead of mapping equivalence classes of tems by * to nodes in a tee, we extend some well-known popeties of functions as follows: Definition 9. Fo a mapping f : A B, we use dom(f) and an(f) to denote f s domain A and ange B, espectively. Given an equivalence elation * on dom(f), we say that f is injective modulo * if, fo all a, a dom(f), f(a) = f(a ) implies a * a and we say that f is bijective modulo * if f is injective modulo * and sujective. Let q be a quey. A tee mapping fo q is a total function f fom tems in q to a tee such that 173

Glimm, Hoocks, Lutz, & Sattle 1. f is bijective modulo *, 2. if (t, t ) q, then f(t) is a neighbo of f(t ), and, 3. if a Inds(q), then f(a) = ε. The quey q is tee-shaped if (Inds(q)) 1 and thee is a tee mapping fo q. A oot choice R fo q is a subset of Tems(q) such that Inds(q) R and, if t R and t * t, then t R. Fo t R, we use Reach(t) to denote the set of tems t Tems(q) fo which thee exists a sequence of tems t 1,...,t n Tems(q) such that 1. t 1 = t and t n = t, 2. fo all 1 i < n, thee is a ole such that (t i, t i+1 ) q, and, 3. fo 1 < i n, if t i R, then t i * t. We call R a oot splitting w..t. q if eithe R = o if, fo t i, t j R, t i * t j implies that Reach(t i ) Reach(t j ) =. Each tem t R induces a sub-quey subq(q, t) := {at q the tems in at occu in Reach(t)}\ {(t, t) (t, t) q}. A quey q is foest-shaped w..t. a oot splitting R if eithe R = and q is tee-shaped o each sub-quey subq(q, t) fo t R is tee-shaped. Fo each tem t R, we collect the tems that ae eachable fom t in the set Reach(t). By Condition 3, we make sue that R and * ae such that each t Reach(t) is eithe not in R o t * t. Since queies ae connected by assumption, we would othewise collect all tems in Reach(t) and not just those t / R. Fo a oot splitting, we equie that the esulting sets ae mutually disjoint fo all tems t, t R that ae not equivalent. This guaantees that all paths between the sub-queies go via the oot nodes of thei espective tees. Intuitively, a foest-shaped quey is one that can potentially be mapped onto a canonical intepetation I = ( I, I) such that the tems in the oot splitting R coespond to oots (a, ε) I. In the definition of subq(q, t), we exclude loops of the fom (t, t) q, as these pats of the quey ae gounded late in the quey ewiting pocess and between gound tems, we allow abitay elationships. Conside, fo example, the quey q s of ou unning example fom the pevious section (cf. Figue 6). Let us again make the oot choice R := {ux, x} fo q. The sets Reach(ux) and Reach(x) w..t. q s and R ae {ux, u} and {x, y, z} espectively. Since both sets ae disjoint, R is a oot splitting w..t. q s. If we choose, howeve, R := {x, y}, the set R is not a oot splitting w..t. q s since Reach(x) = {ux, u, z} and Reach(y) = {z} ae not disjoint. 5.2 Fom Gaphs to Foests We ae now eady to define the quey ewiting steps. Given an abitay quey, we exhaustively apply the ewiting steps and show that we can use the esulting queies that ae foest-shaped fo deciding entailment of the oiginal quey. Please note that the following definitions ae fo conjunctive queies and not fo unions of conjunctive queies since we apply the ewiting steps fo each disjunct sepaately. 174

Conjunctive Quey Answeing fo the DL SHIQ Definition 10. Let q be a Boolean conjunctive quey. A collapsing q co of q is obtained by adding zeo o moe equality atoms of the fom t t fo t, t Tems(q) to q. We use co(q) to denote the set of all queies that ae a collapsing of q. Let K be a SHIQ knowledge base. A quey q s is called a split ewiting of q w..t. K if it is obtained fom q by choosing, fo each atom (t, t ) q, to eithe: 1. do nothing, 2. choose a ole s Tans R such that s * R and eplace (t, t ) with s(t, u), s(u, t ), o 3. choose a ole s Tans R such that s * R and eplace (t, t ) with s(t, u), s(u, u ), s(u, t ), whee u, u N V ae possibly fesh vaiables. We use s K (q) to denote the set of all pais (q s, R) fo which thee is a quey q co co(q) such that q s is a split ewiting of q co and R is a oot splitting w..t. q s. A quey q l is called a loop ewiting of q w..t. a oot splitting R and K if it is obtained fom q by choosing, fo all atoms of the fom (t, t) q with t / R, a ole s Tans R such that s * R and by eplacing (t, t) with two atoms s(t, t ) and s(t, t) fo t N V a possibly fesh vaiable. We use l K (q) to denote the set of all pais (q l, R) fo which thee is a tuple (q s, R) s K (q) such that q l is a loop ewiting of q s w..t. R and K. Fo a foest ewiting, fix a set V N V of vaiables not occuing in q such that (V ) (Vas(q)). A foest ewiting q f w..t. a oot splitting R of q and K is obtained fom q by choosing, fo each ole atom (t, t ) such that eithe R = and (t, t ) q o thee is some t R and (t, t ) subq(q, t ) to eithe 1. do nothing, o 2. choose a ole s Tans R such that s * R and eplace (t, t ) with l (Vas(q)) ole atoms s(t 1, t 2 ),..., s(t l, t l+1 ), whee t 1 = t, t l+1 = t, and t 2,...,t l Vas(q) V. We use f K (q) to denote the set of all pais (q f, R) fo which thee is a tuple (q l, R) l K (q) such that q f is a foest-shaped foest ewiting of q l w..t. R and K. If K is clea fom the context, we say that q is a split, loop, o foest ewiting of q instead of saying that q is a split, loop, o foest ewiting of q w..t. K. We assume that s K (q), l K (q), and f K (q) contain no isomophic queies, i.e., diffeences in (newly intoduced) vaiable names only ae neglected. In the next section, we show how we can build a disjunction of conjunctive queies q 1 q l fom the queies in f K (q) such that each q i fo 1 i l is eithe of the fom C(v) fo a single vaiable v Vas(q i ) o q i is gound, i.e., q i contains only constants and no vaiables. It then emains to show that K = q iff K = q 1 q l. 5.3 Fom Tees to Concepts In ode to tansfom a tee-shaped quey into a single concept atom and a foest-shaped quey into a gound quey, we define a mapping f fom the tems in each tee-shaped subquey to a tee. We then incementally build a concept that coesponds to the tee-shaped quey by tavesing the tee in a bottom-up fashion, i.e., fom the leaves upwads to the oot. 175

Glimm, Hoocks, Lutz, & Sattle Definition 11. Let q be a tee-shaped quey with at most one individual name. If a Inds(q), then let t = a othewise let t = v fo some vaiable v Vas(q). Let f be a tee mapping such that f(t ) = ε. We now inductively assign, to each tem t Tems(q), a concept con(q, t) as follows: if f(t) is a leaf of an(f), then con(q, t) := C(t) q C, if f(t) has successos f(t 1 ),...,f(t k ), then con(q, t) := C(t) q C 1 i k ( (t,t i ) q ).con(q, t i ). Finally, the quey concept of q w..t. t is con(q, t ). Please note that the above definition takes equality atoms into account. This is because the function f is bijective modulo * and, in case thee ae concept atoms C(t) and C(t ) fo t * t, both concepts ae conjoined in the quey concept due to the use of the elation. Simila aguments can be applied to the ole atoms. The following lemma shows that quey concepts indeed captue the semantics of q. Lemma 12. Let q be a tee-shaped quey with t Tems(q) as defined above, C q = con(q, t ), and I an intepetation. Then I = q iff thee is a match π and an element d C q I such that π(t ) = d. The poof given by Hoocks, Sattle, Tessais, and Tobies (1999) easily tansfes fom DLR to SHIQ. By applying the esult fom the above lemma, we can now tansfom a foest-shaped quey into a gound quey as follows: Definition 13. Let (q f, R) f K (q) fo R, and τ : R Inds(A) a total function such that, fo each a Inds(q), τ(a) = a and, fo t, t R, τ(t) = τ(t ) iff t * t. We call such a mapping τ a gound mapping fo R w..t. A. We obtain a gound quey gound(q f, R, τ) of q f w..t. the oot splitting R and gound mapping τ as follows: eplace each t R with τ(t), and, fo each a an(τ), eplace the sub-quey q a = subq(q f, a) with con(q a, a). We define the set gound K (q) of gound queies fo q w..t. K as follows: gound K (q) := {q thee exists some (q f, R) f K (q) with R and some gound mapping τ w..t. A and R such that q = gound(q f, R, τ)} We define the set of tees K (q) of tee queies fo q as follows: tees K (q) := {q thee exists some (q f, ) f K (q) and v Vas(q f ) such that q = (con(q f, v))(v)} 176

Conjunctive Quey Answeing fo the DL SHIQ Going back to ou unning example, we have aleady seen that (q f, {ux, x}) belongs to the set f K (q) fo q f = {(u, ux), (ux, x), (x, y), t(y, y ), t(y, y), s(z, y), (y, z)}. Thee ae also seveal othe queies in the set f K (q), e.g., (q, {u, x, y, z}), whee q is the oiginal quey and the oot splitting R is such that R = Tems(q), i.e., all tems ae in the oot choice fo q. In ode to build the set gound K (q), we now build all possible gound mappings τ fo the set Inds(A) of individual names in ou ABox and the oot splittings fo the queies in f K (q). The tuple (q f, {ux, x}) f K (q) contibutes two gound queies fo the set gound K (q): gound(q f, {ux, x}, {ux a, x b}) = {(a, b), ( Inv(). )(a), (.(( ( Inv(s)). ) ( (t Inv(t)). )))(b)}, whee Inv(). is the quey concept fo the (tee-shaped) sub-quey subq(q f, ux) and.(( ( Inv(s)). ) ( (t Inv(t)). ) is the quey concept fo subq(q f, x) and gound(q f, {ux, x}, {ux b, x a}) = {(b, a), ( Inv(). )(b), (.(( ( Inv(s)). ) ( (t Inv(t)). )))(a)}. The tuple (q, {u, x, y, z}) f K (q), howeve, does not contibute a gound quey since, fo a gound mapping, we equie that τ(t) = τ(t ) iff t * t and thee ae only two individual names in Inds(A) compaed to fou tems q that need a distinct value. Intuitively, this is not a estiction, since in the fist ewiting step (collapsing) we poduce all those queies in which the tems of q have been identified with each othe in all possible ways. In ou example, K = q and K = q 1 q l, whee q 1 q l ae the queies fom tees K (q) and gound K (q) since each model I of K satisfies q i = gound(q f, {ux, x}, {ux a, x b}). 5.4 Quey Matches Even if a quey is tue in a canonical model, it does not necessaily mean that the quey is tee- o foest-shaped. Howeve, a match π fo a canonical intepetation can guide the pocess of ewiting a quey. Similaly to the definition of tee- o foest-shaped queies, we define the shape of matches fo a quey. In paticula, we intoduce thee diffeent kinds of matches: split matches, foest matches, and tee matches such that evey tee match is a foest match, and evey foest match is a split match. The coespondence to the quey shapes is as follows: given a split match π, the set of all oot nodes (a, ε) in the ange of the match define a oot splitting fo the quey, if π is additionally a foest match, the quey is foest-shaped w..t. the oot splitting induced by π, and if π is additionally a tee match, then the whole quey can be mapped to a single tee (i.e., the quey is tee-shaped o foest-shaped w..t. an empty oot splitting). Given an abitay quey match into a canonical model, we can fist obtain a split match and then a tee o foest match, by using the stuctue of the canonical model fo guiding the application of the ewiting steps. Definition 14. Let K be a SHIQ knowledge base, q a quey, I = ( I, I) a canonical model of K, and π: Tems(q) I an evaluation such that I = π q. We call π a split match if, fo all (t, t ) q, one of the following holds: 177

Glimm, Hoocks, Lutz, & Sattle 1. π(t) = (a, ε) and π(t ) = (b, ε) fo some a, b Inds(A); o 2. π(t) = (a, w) and π(t ) = (a, w ) fo some a Inds(A) and w, w IN. We call π a foest match if, additionally, fo each tem t Tems(q) with π(t ) = (a, ε) and a Inds(A), thee is a total and bijective mapping f fom {(a, w) (a, w) an(π)} to a tee T such that (t, t ) subq(q, t ) implies that f(π(t)) is a neighbo of f(π(t )). We call π a tee match if, additionally, thee is an a Inds(A) such that each element in an(π) is of the fom (a, w). A split match π fo a canonical intepetation induces a (possibly empty) oot splitting R such that t R iff π(t) = (a, ε) fo some a Inds(A). We call R the oot splitting induced by π. Fo two elements (a, w) and (a, w ) in a canonical model, the path fom (a, w) to (a, w ) is the sequence (a, w 1 ),...,(a, w n ) whee w = w 1, w = w n, and, fo 1 i < n, w i+1 is a successo of w i. The length of the path is n. Please note that, fo a foest match, we do not equie that w is a neighbo of w o vice vesa. This still allows to map ole atoms to paths in the canonical model of length geate than two, but such paths must be between ancestos and not between elements in diffeent banches of the tee. The mapping f to a tee also makes sue that if R is the induced oot splitting, then each sub-quey subq(q, t) fo t R is tee-shaped. Fo a tee match, the oot splitting is eithe empty o t * t fo each t, t R, i.e., thee is a single oot modulo *, and the whole quey is tee-shaped. 5.5 Coectness of the Quey Rewiting The following lemmas state the coectness of the ewiting step by step fo each of the ewiting stages. Full poofs ae given in the appendix. As motivated in the pevious section, we can use a given canonical model to guide the ewiting pocess such that we obtain a foest-shaped quey that also has a match into the model. Lemma 15. Let I be a model fo K. 1. If I = q, then thee is a collapsing q co of q such that I = πco q co fo π co an injection modulo *. 2. If I = πco q co fo a collapsing q co of q, then I = q. Given a model I that satisfies q, we can simply add equality atoms fo all pais of tems that ae mapped to the same element in I. It is not had to see that this esults in a mapping that is injective modulo *. Fo the second pat, it is easy to see that a model that satisfies a collapsing also satisfies the oiginal quey. Lemma 16. Let I be a model fo K. 1. If I is canonical and I = π q, then thee is a pai (q s, R) s K (q) and a split match π s such that I = πs q s, R is the induced oot splitting of π s, and π s is an injection modulo *. 2. If (q s, R) s K (q) and I = πs q s fo some match π s, then I = q. 178

Conjunctive Quey Answeing fo the DL SHIQ Fo the fist pat of the lemma, we poceed exactly as illustated in the example section and use the canonical model I and the match π to guide the ewiting steps. We fist build a collapsing q co co(q) as descibed in the poof of Lemma 15 such that I = πco q co fo π co an injection modulo *. Since I is canonical, paths between diffeent tees can only occu due to non-simple oles, and thus we can eplace each ole atom that uses such a shot-cut with two o thee ole atoms such that these oots ae explicitly included in the quey (cf. the quey and match in Figue 4 and the obtained split ewiting and with a split match in Figue 7). The second pat of the lemma follows immediately fom the fact that we use only tansitive sub-oles in the eplacement. Lemma 17. Let I be a model of K. 1. If I is canonical and I = q, then thee is a pai (q l, R) l K (q) and a mapping π l such that I = π l q l, π l is an injection modulo *, R is the oot splitting induced by π l and, fo each (t, t) q l, t R. 2. If (q l, R) l K (q) and I = π l q l fo some match π l, then I = q. The second pat is again staightfowad, given that we can only use tansitive sub-oles in the loop ewiting. Fo the fist pat, we poceed again as descibed in the examples section and use the canonical model I and the match π to guide the ewiting pocess. We fist build a split ewiting q s and its oot splitting R as descibed in the poof of Lemma 16 such that (q s, R) s K (q) and I = πs q s fo a split match π s. Since I is a canonical model, it has a foest base J. In a foest base, non-oot nodes cannot be successos of themselves, so each such loop is a shot-cut due to some tansitive ole. An element that is, say, -elated to itself has, theefoe, a neighbo that is both an - and Inv()-successo. Depending on whethe this neighbo is aleady in the ange of the match, we can eithe e-use an existing vaiable o intoduce a new one, when making this path explicit (cf. the loop ewiting depicted in Figue 8 obtained fom the split ewiting shown in Figue 7). Lemma 18. Let I be a model of K. 1. If I is canonical and I = q, then thee is a pai (q f, R) f K (q) such that I = π f q f fo a foest match π f, R is the induced oot splitting of π f, and π f is an injection modulo *. 2. If (q f, R) f K (q) and I = π f q f fo some match π f, then I = q. The main challenge is again the poof of (1) and we just give a shot idea of it hee. At this point, we know fom Lemma 17 that we can use a quey q l fo which thee is a oot splitting R and a split match π l. Since π l is a split match, the match fo each such sub-quey is esticted to a tee and thus we can tansfom each sub-quey of q l induced by a tem t in the oot choice sepaately. The following example is meant to illustate why the given bound of (Vas(q)) on the numbe of new vaiables and ole atoms that can be intoduced in a foest ewiting suffices. Figue 10 depicts the epesentation of a tee fom a canonical model, whee we use only the second pat of the names fo the elements, e.g., we use just ε instead of (a, ε). Fo simplicity, we also do not indicate the concepts and oles that label the nodes and edges, espectively. We use black colo to indicate the nodes 179

Glimm, Hoocks, Lutz, & Sattle and edges that ae used in the match fo a quey and dashed lines fo shot-cuts due to tansitive oles. In the example, the gey edges ae also those that belong to the foest base and the quey match uses only shot-cuts. ε 1 11 12 111 Figue 10: A pat of a epesentation of a canonical model, whee the black nodes and edges ae used in a match fo a quey and dashed edges indicate shot-cuts due to tansitive oles. The foest ewiting aims at making the shot-cuts moe explicit by eplacing them with as few edges as necessay to obtain a tee match. In ode to do this, we need to include the common ancestos in the foest base between each two nodes used in the match. Fo w, w IN, we theefoe define the longest common pefix (LCP) of w and w as the longest ŵ IN such that ŵ is a pefix of both w and w. Fo a foest ewiting, we now detemine the LCPs of any two nodes in the ange of the match and add a vaiable fo those LCPs that ae not yet in the ange of the match to the set V of new vaiables used in the foest ewiting. In the example fom Figue 10 the set V contains a single vaiable v 1 fo the node 1. We now explicate the shot-cuts as follows: fo any edge used in the match, e.g., the edge fom ε to 111 in the example, we define its path as the sequence of elements on the path in the foest base, e.g., the path fo the edge fom ε to 111 is ε, 1, 11, 111. The elevant path is obtained by dopping all elements fom the path that ae not in the ange of the mapping o coespond to a vaiable in the set V, esulting in a elevant path of ε, 1, 111 fo the example. We now eplace the ole atom that was matched to the edge fom ε to 111 with two ole atoms such that the match uses the edge fom ε to 1 and fom 1 to 111. An appopiate tansitive sub-ole exists since othewise thee could not be a shot-cut. Simila aguments can be used to eplace the ole atom mapped to the edge fom 111 to 12 and fo the one that is mapped to the edge fom ε to 12, esulting in a match as epesented by Figue 11. The given estiction on the cadinality of the set V is no limitation since the numbe of LCPs in the set V is maximal if thee is no pai of nodes such that one is an ancesto of the othe. We can see these nodes as n leaf nodes of a tee that is at least binaily banching. Since such a tee can have at most n inne nodes, we need at most n new vaiables fo a quey in n vaiables. 180

Conjunctive Quey Answeing fo the DL SHIQ ε 1 11 12 111 Figue 11: The match fo a foest ewiting obtained fom the example given in Figue 10. Fo the bound on the numbe of ole atoms that can be used in the eplacement of a single ole atom, conside, fo example, the cyclic quey q = {(x 1, x 2 ), (x 2, x 3 ), (x 3, x 4 ), t(x 1, x 4 )}, fo the knowledge base K = (T, R, A) with T =, R = { t} with t Tans R and A = {(.(.(. )))(a)}. It is not had to check that K = q. Similaly to ou unning example fom the pevious section, thee is also a single ewiting that is tue in each canonical model of the KB, which is obtained by building only a foest ewiting and doing nothing in the othe ewiting steps, except fo choosing the empty set as oot splitting in the split ewiting step. In the foest ewiting, we can explicate the shot-cut used in the mapping fo t(x 1, x 4 ) by eplacing t(x 1, x 4 ) with t(x 1, x 2 ), t(x 2, x 3 ), t(x 3, x 4 ). By using Lemmas 15 to 18, we get the following theoem, which shows that we can use the gound queies in gound K (q) and the queies in tees K (q) in ode to check whethe K entails q, which is a well undestood poblem. Theoem 19. Let K be a SHIQ knowledge base, q a Boolean conjunctive quey, and {q 1,...,q l } = tees K (q) gound K (q). Then K = q iff K = q 1... q l. We now give uppe bounds on the size and numbe of queies in tees K (q) and gound K (q). As befoe, we use (S) to denote the cadinality of a set S. The size K ( q ) of a knowledge base K (a quey q) is simply the numbe of symbols needed to wite it ove the alphabet of constuctos, concept names, and ole names that occu in K (q), whee numbes ae encoded in binay. Obviously, the numbe of atoms in a quey is bounded by its size, hence (q) q and, fo simplicity, we use n as the size and the cadinality of q in what follows. Lemma 20. Let q be a Boolean conjunctive quey, K = (T, R, A) a SHIQ knowledge base, q := n and K := m. Then thee is a polynomial p such that 1. (co(q)) 2 p(n) and, fo each q co(q), q p(n), 2. (s K (q)) 2 p(n) log p(m), and, fo each q s K (q), q p(n), 3. (l K (q)) 2 p(n) log p(m), and, fo each q l K (q), q p(n), 181

Glimm, Hoocks, Lutz, & Sattle 4. (f K (q)) 2 p(n) log p(m), and, fo each q f K (q), q p(n), 5. (tees K (q)) 2 p(n) log p(m), and, fo each q tees K (q), q p(n), and 6. (gound K (q)) 2 p(n) log p(m), and, fo each q gound K (q), q p(n). As a consequence of the above lemma, thee is a bound on the numbe of queies in gound K (q) and tees K (q) and it is not had to see that the two sets can be computed in time polynomial in m and exponential in n. In the next section, we pesent an algoithm that decides entailment of unions of conjunctive queies, whee each of the queies is eithe a gound quey o consists of a single concept atom C(x) fo an existentially quantified vaiable x. By Theoem 19 and Lemma 20, such an algoithm is a decision pocedue fo abitay unions of conjunctive queies. 5.6 Summay and Discussion In this section, we have pesented the main technical foundations fo answeing (unions of) conjunctive queies. It is known that queies that contain non-simple oles in cycles among existentially quantified vaiables ae difficult to handle. By applying the ewiting steps fom Definition 10, we can ewite such cyclic conjunctive queies into a set of acyclic and/o gound queies. Both types of queies ae easie to handle and algoithms fo both types exist. At this point, any easoning algoithm fo SHIQ knowledge base consistency can be used fo deciding quey entailment. In ode to obtain tight complexity esults, we pesent in the following section a decision pocedue that is based on an extension of the tanslation to looping tee automata given by Tobies (2001). It is woth mentioning that, fo queies with only simple oles, ou algoithm behaves exactly as the existing ewiting algoithms (i.e., the olling-up and tuple gaph technique) since, in this case, only the collapsing step is applicable. The need fo identifying vaiables was fist pointed out in the wok of Hoocks et al. (1999) and is also equied (although not mentioned) fo the algoithm poposed by Calvanese et al. (1998a). The new ewiting steps (split, loop, and foest ewiting) ae only equied fo and applicable to non-simple oles and, when eplacing a ole atom, only tansitive sub-oles of the eplaced ole can be used. Hence the numbe of esulting queies is in fact not detemined by the size of the whole knowledge base, but by the numbe of tansitive sub-oles fo the non-simple oles in the quey. Theefoe, the numbe of esulting queies eally depends on the numbe of tansitive oles and the depth of the ole hieachy fo the non-simple oles in the quey, which can, usually, expected to be small. 6. The Decision Pocedue We now devise a decision pocedue fo entailment of unions of Boolean conjunctive queies that uses, fo each disjunct, the queies obtained in the ewiting pocess as defined in the pevious section. Detailed poofs fo the lemmas and theoems in this section can again be found in the appendix. Fo a knowledge base K and a union of Boolean conjunctive queies q 1... q l, we show how we can use the queies in tees K (q i ) and gound K (q i ) fo 1 i l in ode to build a set of knowledge bases K 1,...,K n such that K = q 1... q l iff all the K i ae inconsistent. This gives ise to two decision pocedues: a deteministic one in which 182

Conjunctive Quey Answeing fo the DL SHIQ we enumeate all K i, and which we use to deive a tight uppe bound fo the combined complexity; and a non-deteministic one in which we guess a K i, and which yields a tight uppe bound fo the data complexity. Recall that, fo combined complexity, the knowledge base K and the queies q i both count as input, wheeas fo the data complexity only the ABox A counts as an input, and all othe pats ae assumed to be fixed. 6.1 A Deteministic Decision Pocedue fo Quey Entailment in SHIQ We fist define the deteministic vesion of the decision pocedue and give an uppe bound fo its combined complexity. The given algoithm takes as input a union of connected conjunctive queies and woks unde the unique name assumption (UNA). We show aftewads how it can be extended to an algoithm that does not make the UNA and that takes abitay UCQs as input, and that the complexity esults cay ove. We constuct a set of knowledge bases that extend the oiginal knowledge base K both w..t. the TBox and ABox. The extended knowledge bases ae such that a given KB K entails a quey q iff all the extended KBs ae inconsistent. We handle the concepts obtained fom the tee-shaped queies diffeently to the gound queies: the axioms we add to the TBox pevent matches fo the tee-shaped queies, wheeas the extended ABoxes contain assetions that pevent matches fo the gound queies. Definition 21. Let K = (T, R, A) be a SHIQ knowledge base and q = q 1... q l a union of Boolean conjunctive queies. We set 1. T := tees K (q 1 )... tees K (q l ), 2. G := gound K (q 1 )... gound K (q l ), and 3. T q := { C C(v) T }. An extended knowledge base K q w..t. K and q is a tuple (T T q, R, A A q ) such that A q contains, fo each q G, at least one assetion at with at q. Infomally, the extended TBox T T q ensues that thee ae no tee matches. Each extended ABox A A q contains, fo each gound quey q obtained in the ewiting pocess, at least one assetion at with at q that spoils a match fo q. A model fo such an extended ABox can, theefoe, not satisfy any of the gound queies. If thee is a model fo any of the extended knowledge bases, we know that this is a counte-model fo the oiginal quey. We can now use the extended knowledge bases in ode to define the deteministic vesion of ou algoithm fo deciding entailment of unions of Boolean conjunctive queies in SHIQ. Definition 22. Given a SHIQ knowledge base K = (T, R, A) and a union of connected Boolean conjunctive queies q as input, the algoithm answes K entails q if each extended knowledge base w..t. K and q is inconsistent and it answes K does not entail q othewise. The following lemma shows that the above descibed algoithm is indeed coect. 183

Glimm, Hoocks, Lutz, & Sattle Lemma 23. Let K be a SHIQ knowledge base and q a union of connected Boolean conjunctive queies. Given K and q as input, the algoithm fom Definition 22 answes K entails q iff K = q unde the unique name assumption. In the poof of the if diection fo the above lemma, we can use a canonical model I of K in ode to guide the ewiting pocess. Fo the only if diection, we assume to the contay of what is to be shown that thee is no consistent extended knowledge base, but K = q. We then use a model I of K such that I = q, which exists by assumption, and show that I is also a model of some extended knowledge base. 6.1.1 Combined Complexity of Quey Entailment in SHIQ Accoding to the above lemma, the algoithm given in Definition 22 is coect. We now analyse its combined complexity and theeby pove that it is also teminating. Fo the complexity analysis, we assume, as usual (Hustadt et al., 2005; Calvanese, De Giacomo, Lembo, Lenzeini, & Rosati, 2006; Otiz et al., 2006b), that all concepts in concept atoms and ABox assetions ae liteals, i.e., concept names o negated concept names. If the input quey o ABox contains non-liteal atoms o assetions, we can easily tansfom these into liteal ones in a tuth peseving way: fo each concept atom C(t) in the quey whee C is a non-liteal concept, we intoduce a new atomic concept A C N C, add the axiom C A C to the TBox, and eplace C(t) with A C (t); fo each non-liteal concept assetion C(a) in the ABox, we intoduce a new atomic concept A C N C, add an axiom A C C to the TBox, and eplace C(a) with A C (a). Such a tansfomation is obviously polynomial, so without loss of geneality, it is safe to assume that the ABox and quey contain only liteal concepts. This has the advantage that the size of each atom and ABox assetion is constant. Since ou algoithm involves checking the consistency of a SHIQ knowledge base, we analyse the complexity of this easoning sevice. Tobies (2001) shows an ExpTime uppe bound fo deciding the consistency of SHIQ knowledge bases (even with binay coding of numbes) by tanslating a SHIQ KB to an equisatisfiable ALCQIb knowledge base. The b stands fo safe Boolean ole expessions built fom ALCQIb oles using the opeato (ole intesection), (ole union), and (ole negation/complement) such that, when tansfomed into disjunctive nomal fom, evey disjunct contains at least one nonnegated conjunct. Given a quey q and a SHIQ knowledge base K = (T, R, A), we educe quey entailment to deciding knowledge base consistency of an extended SHIQ knowledge base K q = (T T q, R, A A q ). Recall that T q and A q ae the only pats that contain ole conjunctions and that we use ole negation only in ABox assetions. We extend the tanslation given fo SHIQ so that it can be used fo deciding the consistency of SHIQ KBs. Although the tanslation woks fo all SHIQ KBs, we assume the input KB to be of exactly the fom of extended knowledge bases as descibed above. This is so because the tanslation fo unesticted SHIQ is no longe polynomial, as in the case of SHIQ, but exponential in the size of the longest ole conjunction unde a univesal quantifie. Since ole conjunctions occu only in the extended ABox and TBox, and since the size of each ole conjunction is, by Lemma 20, polynomial in the size of q, the tanslation is only exponential in the size of the quey in the case of extended knowledge bases. 184

Conjunctive Quey Answeing fo the DL SHIQ We assume hee, as usual, that all concepts ae in negation nomal fom (NNF); any concept can be tansfomed in linea time into an equivalent one in NNF by pushing negation inwads, making use of de Mogan s laws and the duality between existential and univesal estictions, and between atmost and atleast numbe estictions ( n.c and n.c espectively) (Hoocks et al., 2000). Fo a concept C, we use C to denote the NNF of C. We define the closue cl(c, R) of a concept C w..t. a ole hieachy R as the smallest set satisfying the following conditions: if D is a sub-concept of C, then D cl(c, R), if D cl(c, R), then D cl(c, R), if.d cl(c, R), s * R, and s Tans R, then s.d cl(c, R). We now show how we can extend the tanslation fom SHIQ to ALCQIb given by Tobies. We fist conside SHIQ -concepts and then extend the tanslation to KBs. Definition 24. Fo a ole hieachy R and oles, 1,..., n, let (, R) = s and ( 1... n, R) = ( 1, R)... ( n, R). * Rs Please note that, since * R, occus in (, R). Lemma 25. Let R be a ole hieachy, and 1,..., n oles. Fo evey intepetation I such that I = R, it holds that ( ( 1... n, R)) I = ( 1... n ) I. With the extended definition of on ole conjunctions, we can now adapt the definition (Def. 6.22) that Tobies povides fo tanslating SHIQ-concepts into ALCQIb-concepts. Definition 26. Let C be a SHIQ -concept in NNF and R a ole hieachy. Fo evey concept ( 1... n ).D cl(c, R), let X 1... n,d N C be a unique concept name that does not occu in cl(c, R). Given a ole hieachy R, we define the function t inductively on the stuctue of concepts by setting t(a, R) = A fo all A N C t( A, R) = A fo all A N C t(c 1 C 2, R) = t(c 1, R) t(c 2, R) t(c 1 C 2, R) = t(c 1, R) t(c 2, R) t( n( 1... n ).D, R) = ( n ( 1... n, R).t(D, R)) t( ( 1... n ).D, R) = X 1... n,d t( ( 1... n ).D, R) = (X 1... n, D) whee stands fo o. Set tc(( 1... n ), R) := {(t 1... t n ) t i * R i and t i Tans R fo each i such that 1 i n} and define an extended TBox T C,R as T C,R ={X 1... n,d ( 1... n, R).t(D, R) ( 1... n ).D cl(c, R)} {X 1... n,d (T, R).X T,D T tc( 1... n, R)} Lemma 27. Let C be a SHIQ -concept in NNF, R a ole hieachy, and t and T C,R as defined in Definition 26. The concept C is satisfiable w..t. R iff the ALCQIb-concept t(c, R) is satisfiable w..t. T C,R. 185

Glimm, Hoocks, Lutz, & Sattle Given Lemma 25, the poof of Lemma 27 is a long, but staightfowad extension of the poof given by Tobies (2001, Lemma 6.23). We now analyse the complexity of the above descibed poblem. Let m := R and 1... n the longest ole conjunction occuing in C, i.e., the maximal numbe of oles that occu in a ole conjunction in C is n. The TBox T C,R can contain exponentially many axioms in n since the cadinality of the set tc(( 1... n ), R) fo the longest ole conjunction can only be bounded by m n because each i can have moe than one tansitive sub-ole. It is not had to check that the size of each axiom is polynomial in C. Since deciding whethe an ALCQIb concept C is satisfiable w..t. an ALCQIb TBox T is an ExpTime-complete poblem (even with binay coding of numbes) (Tobies, 2001, Thm. 4.42), the satisfiability of a SHIQ -concept C can be checked in time 2 p(m)2p(n). We now extend the tanslation fom concepts to knowledge bases. Tobies assumes that all ole assetions in the ABox ae of the fom (a, b) with a ole name o the invese of a ole name. Extended ABoxes contain, howeve, also negated oles in ole assetions, which equie a diffeent tanslation. A positive ole assetion such as (a, b) is tanslated in the standad way by closing the ole upwads. The only diffeence of using diectly is that we additionally split the conjunction ( (, R))(a, b) = ( 1... n )(a, b) into n diffeent ole assetions 1 (a, b),..., n (a, b), which is clealy justified by the semantics. Fo negated oles in a ole assetion such as (a, b), we close the ole downwads instead of upwads and add a ole atom s(a, b) fo each sub-ole s of. This is again justified by the semantics. Let K = (T T q, R, A A q ) be an extended knowledge base. Moe pecisely, we set t(t T q, R) := {t(c, R) t(d, R) C D T T q }, t(a A q, R) := {(t(c, R))(a) C(a) A A q } {s(a, b) (a, b) A A q and * Rs} { s(a, b) (a, b) A A q and s * R}, and we use t(k, R) to denote the ALCQIb knowledge base (t(t T q, R), t(a A q, R)). Fo the complexity of deciding the consistency of a tanslated SHIQ knowledge base, we can apply the same aguments as above fo concept satisfiability, which gives the following esult: Lemma 28. Given a SHIQ knowledge base K = (T, R, A) whee m := K and the size of the longest ole conjunction is n, we can decide consistency of K in deteministic time 2 p(m)2p(n) with p a polynomial. We ae now eady to show that the algoithm given in Definition 22 uns in deteministic time single exponential in the size of the input KB and double exponential in the size of the input quey. Lemma 29. Let K = (T, R, A) be a SHIQ knowledge base with m = K and q a union of connected Boolean conjunctive queies with n = q. Given K and q as input, the algoithm given in Definition 22 decides whethe K = q unde the unique name assumption in deteministic time in 2 p(m)2p(n). 186

Conjunctive Quey Answeing fo the DL SHIQ In the poof of the above lemma, we show that thee is some polynomial p such that we have to check at most 2 p(m)2p(n) extended knowledge bases fo consistency and that each consistency check can be done in this time bound as well. Moe pecisely, let q = q 1... q l, T = tees K (q 1 )... tees K (q l ), and G = gound K (q 1 )... gound K (q l ). Togethe with Lemma 20, we get that (T) and (G) ae bounded by 2 p(n) log p(m) fo some polynomial p and that the size of each quey in G and T is polynomial in n. Each of the 2 p(n) log p(m) gound queies in G contibutes at most p(n) negated assetion to an extended ABox A q. Hence, thee ae at most 2 p(m)2p(n) extended ABoxes A q and, theefoe, 2 p(m)2p(n) extended knowledge bases that have to be tested fo consistency. Given the bounds on the cadinalities of T and G and the fact that the size of each quey in T and G is polynomial in n, it is not had to check that the size of each extended knowledge base K q = (T T q, R, A A q ) is bounded by 2 p(n) log p(m) and that each K q can be computed in this time bound as well. Since only the extended pats contain ole conjunctions and the numbe of oles in a ole conjunction is polynomial in n, thee is a polynomial p such that 1. t(t, R) p(m), 2. t(t q, R) 2 p(n) log p(m), 3. t(a, R) p(m), 4. t(a q, R) 2 p(n) log p(m), and, hence, 5. t(k q, R) 2 p(n) log p(m). By Lemma 28, each consistency check can be done in time 2 p(m)2p(n) fo some polynomial p. Since we have to check at most 2 p(m)2p(n) extended knowledge bases fo consistency, and each check can be done in time 2 p(m)2p(n), we obtain the desied uppe bound. We now show that this esult caies ove even when we do not estict intepetations to the unique name assumption. Definition 30. Let K = (T, R, A) be a SHIQ knowledge base and q a SHIQ union of Boolean conjunctive queies. Fo a patition P of Inds(A), a knowledge base K P = (T, R, A P ) and a quey q P ae called an A-patition w..t. K and q if A P and q P ae obtained fom A and q as follows: Fo each P P 1. Choose one individual name a P. 2. Fo each b P, eplace each occuence of b in A and q with a. Please note that w.l.o.g. we assume that all constants that occu in the quey occu in the ABox as well and that thus a patition of the individual names in the ABox also patitions the quey. Lemma 31. Let K = (T, R, A) be a SHIQ knowledge base and q a union of Boolean conjunctive queies. K = q without making the unique name assumption iff thee is an A-patition K P = (T, R, A P ) and q P w..t. K and q such that K P = q P unde the unique name assumption. 187

Glimm, Hoocks, Lutz, & Sattle Let K = (T, R, A) be a knowledge base in a Desciption Logic DL, C be the complexity class such that deciding whethe K = q unde the unique name assumption is in C, and let n = 2 A. Since the numbe of patitions fo an ABox is at most exponential in the numbe of individual names that occu in the ABox, the following is a staightfowad consequence of the above lemma: fo a Boolean conjunctive DL quey q, deciding whethe K = q without making the unique name assumption can be educed to deciding n times a poblem in C. In ode to extend ou algoithm to unions of possibly unconnected Boolean conjunctive queies, we fist tansfom the input quey q into conjunctive nomal fom (CNF). We then check entailment fo each conjunct q i, which is now a union of connected Boolean conjunctive queies. The algoithm etuns K entails q if each entailment check succeeds and it answes K does not entail q othewise. By Lemma 5 and Lemma 23, the algoithm is coect. Let K be a knowledge base in a Desciption Logic DL, q a union of connected Boolean conjunctive DL queies, and C the complexity class such that deciding whethe K = q is in C. Let q be a union of possibly unconnected Boolean conjunctive queies and cnf(q ) the CNF of q. Since the numbe of conjuncts in cnf(q ) is at most exponential in q, deciding whethe K = q can be educed to deciding n times a poblem in C, with n = 2 p( q ) and p a polynomial. The above obsevation togethe with the esults fom Lemma 29 gives the following geneal esult: Theoem 32. Let K = (T, R, A) be a SHIQ knowledge base with m = K and q a union of Boolean conjunctive queies with n = q. Deciding whethe K = q can be done in deteministic time in 2 p(m)2p(n). A coesponding lowe bound follows fom the wok by Lutz (2007). Hence the above esult is tight. The esult impoves the known co-3nexptime uppe bound fo the setting whee the oles in the quey ae esticted to simple ones (Otiz, Calvanese, & Eite, 2006a). Coollay 33. Let K be a SHIQ knowledge base with m = K and q a union of Boolean conjunctive queies with n = q. Deciding whethe K = q is a 2ExpTime-complete poblem. Regading quey answeing, we efe back to the end of Section 2.2, whee we explain that deciding which tuples belong to the set of answes can be checked with at most m k A entailment tests, whee k is the numbe of answe vaiables in the quey and m A is the numbe of individual names in Inds(A). Hence, at least theoetically, this is absobed by the combined complexity of quey entailment in SHIQ. 6.2 A Non-Deteministic Decision Pocedue fo Quey Entailment in SHIQ In ode to study the data complexity of quey entailment, we devise a non-deteministic decision pocedue which povides a tight bound fo the complexity of the poblem. Actually, the devised algoithm decides non-entailment of queies: we guess an extended knowledge base K q, check whethe it is consistent, and etun K does not entail q if the check succeeds and K entails q othewise. Definition 34. Let T be a SHIQ TBox, R a SHIQ ole hieachy, and q a union of Boolean conjunctive queies. Given a SHIQ ABox A as input, the algoithm guesses an 188

Conjunctive Quey Answeing fo the DL SHIQ A-patition K P = (T, R, A P ) and q P w..t. K = (T, R, A) and q. The quey q P is then tansfomed into CNF and one of the esulting conjuncts, say qi P, is chosen. The algoithm then guesses an extended knowledge base Kq P i = (T T qi, R, A P A P q i ) w..t. K P and qi P and etuns K does not entail q if Kq P i is consistent and it etuns K entails q othewise. Compaed to the deteministic vesion of the algoithm given in Definition 22, we do not make the UNA but guess a patition of the individual names. We also non-deteministically choose one of the conjuncts that esult fom the tansfomation into CNF. Fo this conjunct, we guess an extended ABox and check whethe the extended knowledge base fo the guessed ABox is consistent and, theefoe, a counte-model fo the quey entailment. In its (equivalent) negated fom, Lemma 23 says that K = q iff thee is an extended knowledge base K q w..t. K and q such that K q is consistent. Togethe with Lemma 31 it follows, theefoe, that the algoithm fom Definition 34 is coect. 6.2.1 Data Complexity of Quey Entailment in SHIQ We now analyze the data complexity of the algoithm given in Definition 34 and show that deciding UCQ entailment in SHIQ is indeed in co-np fo data complexity. Theoem 35. Let T be a SHIQ TBox, R a SHIQ ole hieachy, and q a union of Boolean conjunctive queies. Given a SHIQ ABox A with m a = A, the algoithm fom Definition 34 decides in non-deteministic polynomial time in m a whethe K = q fo K = (T, R, A). Clealy, the size of an ABox A P in an A-patition is bounded by m a. Since the quey is no longe an input, its size is constant and the tansfomation to CNF can be done in constant time. We then non-deteministically choose one of the esulting conjuncts. Let this conjunct be q i = q (i,1)... q (i,l). As established in Lemma 32, the maximal size of an extended ABox A P q i is polynomial in m a. Hence, A P A P q i p(m a ) fo some polynomial p. Due to Lemma 20 and since the size of q, T, and R is fixed by assumption, the sets tees K P(q (i,j) ) and gound K P(q (i,j) ) fo each j such that 1 j l can be computed in time polynomial in m a. Fom Lemma 29, we know that the tanslation of an extended knowledge base into an ALCQIb knowledge base is polynomial in m a and a close inspection of the algoithm by Tobies (2001) fo deciding consistency of an ALCQIb knowledge base shows that its untime is also polynomial in m a. The bound given in Theoem 35 is tight since the data complexity of conjunctive quey entailment is aleady co-np-had fo the ALE fagment of SHIQ (Schaef, 1993). Coollay 36. Conjunctive quey entailment in SHIQ is data complete fo co-np. Due to the coespondence between quey containment and quey answeing (Calvanese et al., 1998a), the algoithm can also be used to decide containment of two unions of conjunctive queies ove a SHIQ knowledge base, which gives the following esult: Coollay 37. Given a SHIQ knowledge base K and two unions of conjunctive queies q and q, the poblem whethe K = q q is decidable. 189

Glimm, Hoocks, Lutz, & Sattle By using the esult of Rosati (2006a, Thm. 11), we futhe show that the consistency of a SHIQ knowledge base extended with (weakly-safe) Datalog ules is decidable. Coollay 38. The consistency of SHIQ+log-KBs (both unde FOL semantics and unde NM semantics) is decidable. 7. Conclusions With the decision pocedue pesented fo entailment of unions of conjunctive queies in SHIQ, we close a long standing open poblem. The solution has immediate consequences on elated aeas, as it shows that seveal othe open poblems such as quey answeing, quey containment and the extension of a knowledge base with weakly safe Datalog ules fo SHIQ ae decidable as well. Regading combined complexity, we pesent a deteministic algoithm that needs time single exponential in the size of the KB and double exponential in the size of the quey, which gives a tight uppe bound fo the poblem. This esult shows that deciding conjunctive quey entailment is stictly hade than instance checking fo SHIQ. We futhe pove co-np-completeness fo data complexity. Inteestingly, this shows that egading data complexity deciding UCQ entailment is (at least theoetically) not hade than instance checking fo SHIQ, which was also a peviously open question. It will be pat of ou futue wok to extend this pocedue to SHOIQ, which is the DL undelying OWL DL. We will also attempt to find moe implementable algoithms fo quey answeing in SHIQ. Caying out the quey ewiting steps in a moe goal diected way will be cucial to achieving this. Acknowledgments This wok was suppoted by the EU funded IST-2005-7603 FET Poject Thinking Ontologies (TONES). Bite Glimm was suppoted by an EPSRC studentship. 190

Conjunctive Quey Answeing fo the DL SHIQ Appendix A. Complete Poofs Lemma (7). Let K be a SHIQ knowledge base and q = q 1... q n a union of conjunctive queies, then K = q iff thee exists a canonical model I of K such that I = q. Poof of Lemma 7. The if diection is tivial. Fo the only if diection, since an inconsistent knowledge base entails evey quey, we can assume that K is consistent. Hence, thee is an intepetation I = ( I, I ) such that I = K and I = q. Fom I, we constuct a canonical model I fo K and its foest base J as follows: we define the set P ( I ) of paths to be the smallest set such that fo all a Inds(A), a I is a path; d 1 d n d is a path, if d 1 d n is a path, (d n, d) I fo some ole, if thee is an a Inds(A) such that d = a I, then n > 2. Fo a path p = d 1 d n, the length len(p) of p is n. Now fix a set S Inds(A) IN and a bijection f : S P such that (i) Inds(A) {ε} S, (ii) fo each a Inds(A), {w (a, w) S} is a tee, (iii) f((a, ε)) = a I, (iv) if (a, w), (a, w ) S with w a successo of w, then f((a, w )) = f((a, w)) d fo some d I. Fo all (a, w) S, set Tail((a, w)) := d n if f((a, w)) = d 1 d n. Now, define a foest base J = ( J, J ) fo K as follows: (a) J := S; (b) fo each a Inds(A), a J := (a, ε) S; (c) fo each b N I \ Inds(A), b J = a J fo some fixed a Inds(A); (d) fo each C N C, (a, w) C J if (a, w) S and Tail((a, w)) C I ; (e) Fo all oles, ((a, w), (b, w )) J if eithe (I) w = w = ε and (a I, b I ) I (II) a = b, w is a neighbo of w and (Tail((a, w)), Tail((b, w ))) I. o 191

Glimm, Hoocks, Lutz, & Sattle It is clea that J is a foest base fo K due to the definition of S and the constuction of J fom S. Let I = ( I, I) be an intepetation that is identical to J except that, fo all non-simple oles, we set I = J s * R, s Tans R (s J ) + It is tedious but not too had to veify that I = K and that J is a foest base fo I. Hence I is a canonical model fo K. Theefoe, we only have to show that I = q. Assume to the contay that I = q. Then thee is some π and i with 1 i n such that I = π q i. We now define a mapping π : Tems(q i ) I by setting π (t) := Tail(π(t)) fo all t Tems(q i ). It is not difficult to check that I = π q i and hence I = π q, which is a contadiction. Lemma (15). Let I be a model fo K. 1. If I = q, then thee is a collapsing q co of q such that I = πco q co fo π co an injection modulo *. 2. If I = πco q co fo a collapsing q co of q, then I = q. Poof of Lemma 15. Fo (1), let π be such that I = π q, let q co be the collapsing of q that is obtained by adding an atom t t fo all tems t, t Tems(q) fo which π(t) = π(t ). By definition of the semantics, I = π q co and π is an injection modulo *. Condition (2) tivially holds since q q co and hence I = πco q. Lemma (16). Let I be a model fo K. 1. If I is canonical and I = π q, then thee is a pai (q s, R) s K (q) and a split match π s such that I = πs q s, R is the induced oot splitting of π s, and π s is an injection modulo *. 2. If (q s, R) s K (q) and I = πs q s fo some match π s, then I = q. Poof of Lemma 16. The poof of the second claim is elatively staightfowad: since (q s, R) s K (q), thee is a collapsing q co of q such that q s is a split ewiting of q co. Since all oles eplaced in a split ewiting ae non-simple and I = q s by assumption, we have that I = q co. By Lemma 15 (2), we then have that I = q as equied. We go though the poof of the fist claim in moe detail: let q co be in co(q) such that I = πco q co fo a match π co that is injective modulo *. Such a collapsing q co and match π co exist due to Lemma 15. If π co is a split match w..t. q and I aleady, we ae done, since a split match induces a oot splitting R and (q co, R) is tivially in s K (q). If π co is not a split match, thee ae at least two tems t, t with (t, t ) q co such that π co (t) = (a, w), π co (t ) = (a, w ), a a, and w ε o w ε. We distinguish two cases: 192

Conjunctive Quey Answeing fo the DL SHIQ 1. Both t and t ae not mapped to oots, i.e., w ε and w ε. Since I = πco (t, t ), we have that (π co (t), π co (t )) I. Since I is a canonical model fo K, thee must be a ole s with s * R and s Tans R such that {(π co (t), (a, ε)), ((a, ε), (a, ε)), ((a, ε), π co (t ))} s I. If thee is some ˆt Tems(q co ) such that π co (ˆt) = (a, ε), then let u = ˆt, othewise let u be a fesh vaiable. Similaly, if thee is some ˆt Tems(q co ) such that π co (ˆt ) = (a, ε), then let u = ˆt, othewise let u be a fesh vaiable. Hence, we can define a split ewiting q s of q co by eplacing (t, t ) with s(t, u), s(u, u ), and s(u, t ). We then define a new mapping π s that agees with π co on all tems that occu in q co and that maps u to (a, ε) and u to (a, ε). 2. Eithe t o t is mapped to a oot. W.l.o.g., let this be t, i.e., π(t) = (a, ε). We can use the same aguments as above: since I = πco (t, t ), we have that (π(t), π(t )) I and, since I is a canonical model fo K, thee must be a ole s with s * R and s Tans R such that {(π(t), (a, ε)), ((a, ε), π(t ))} s I. If thee is some ˆt Tems(q co ) such that π co (ˆt) = (a, ε), then let u = ˆt, othewise let u be a fesh vaiable. We then define a split ewiting q s of q co by eplacing (t, t ) with s(t, u), s(u, t )and a mapping π s that agees with π co on all tems that occu in q co and that maps u to (a, ε). It immediately follows that I = πs q s. We can poceed as descibed above fo each ole atom (t, t ) fo which π(t) = (a, w) and π(t ) = (a, w ) with a a and w ε o w ε. This will esult in a split ewiting q s and a split match π s such that I = πs q s. Futhemoe, π s is injective modulo * since we only intoduce new vaiables, when the vaiable is mapped to an element that is not yet in the ange of the match. Since π s is a split match, it induces a oot splitting R and, hence, (q s, R) s K (q) as equied. Lemma (17). Let I be a model of K. 1. If I is canonical and I = q, then thee is a pai (q l, R) l K (q) and a mapping π l such that I = π l q l, π l is an injection modulo *, R is the oot splitting induced by π l and, fo each (t, t) q l, t R. 2. If (q l, R) l K (q) and I = π l q l fo some match π l, then I = q. Poof of Lemma 17. The poof of (2) is analogous to the one given in Lemma 16 since, by definition of loop ewitings, all oles eplaced in a loop ewiting ae again non-simple. Fo (1), let (q s, R) s K (q) be such that I = πs q s, π s is a split match, and R is the oot splitting induced by π s. Such a split ewiting q s and match π s exist due to Lemma 16 and the canonicity of I. Let (t, t) q s fo t / R. Since R is the oot splitting induced by π s and since t / R, π s (t) = (a, w) fo some a Inds(A) and w ε. Now, let J be a foest base fo I. We show that thee exists a neighbo d of π s (t) and a ole s Tans R such that s * R and (π s (t), d) s I Inv(s) I. Since I = πs q s, we have (π s (t), π s (t)) I. Since J is a foest base and since w ε, we have (π s (t), π s (t)) / J. It follows that thee is a sequence d 1,...,d n I and a ole s Tans R such that s * R, d 1 = π s (t) = d n, and 193

Glimm, Hoocks, Lutz, & Sattle (d i, d i+1 ) s J fo 1 i < n and d i d 1 fo each i with 1 < i < n. Then it is not had to see that, because {w (a, w ) I } is a tee and w ε, we have d 2 = d n 1. Since (d 1, d 2 ) s J and (d n 1, d n ) s J with d n 1 = d 2 and d n = d 1, the ole s and the element d = d 2 is as equied. Fo each (t, t) q s with t / R, select an element d,t and a ole s,t as descibed above. Now let q l be obtained fom q s by doing the following fo each (t, t) q s with t / R: if d,t = π s (t ) fo some t Tems(q s ), then eplace (t, t) with s,t (t, t ) and s,t (t, t); othewise, intoduce a new vaiable v,t N V and eplace (t, t) with s,t (t, v,t ) and s,t (v,t, t). Let π l be obtained fom π s by extending it with π l (v,t ) = d,t fo each newly intoduced vaiable v,t. By definition of q l and π l, q l is connected, π l is injective modulo *, and I = π l q l. Lemma (18). Let I be a model of K. 1. If I is canonical and I = q, then thee is a pai (q f, R) f K (q) such that I = π f q f fo a foest match π f, R is the induced oot splitting of π f, and π f is an injection modulo *. 2. If (q f, R) f K (q) and I = π f q f fo some match π f, then I = q. Poof of Lemma 18. The poof of (2) is again analogous to the one given in Lemma 16. Fo (1), let (q l, R) l K (q) be such that I = π l q l, R is the oot splitting induced by π l, π l is injective modulo * and, fo each (t, t) q l, t R. Such a loop ewiting and match π l exist due to Lemma 17 and the canonicity of I. By definition, R is a oot splitting w..t. q l and K. Fo w, w IN, the longest common pefix (LCP) of w, w is the longest w IN such that w is pefix of both w and w. Fo the match π l we now define the set D as follows: D := an(π l ) {(a, w) I w is the LCP of some w, w with (a, w ), (a, w ) an(π l )}. Let V N V \ Vas(q l ) be such that, fo each d D \ an(π l ), thee is a unique v d V. We now define a mapping π f as π l {v d V d}. By definition of V and v d, π f is a split match as well. The set V Vas(q l ) will be the set of vaiables fo the new quey q f. Note that an(π f ) = D. Fact (a) if (a, w), (a, w ) an(π f ), then (a, w ) an(π f ), whee w is the LCP of w and w ; Fact (b) (V ) (Vas(q l )) (Because, in the wost case, all (a, w) in an(π l ) ae incompaable and can thus be seen as leaves of a binaily banching tee. Now, a tee that has n leaves and is at least binaily banching at evey non-leaf has at most n inne nodes, and thus (V ) (Vas(q l )). 194

Conjunctive Quey Answeing fo the DL SHIQ Fo a pai of individuals d, d I, the path fom d to d is the (unique) shotest sequence of elements d 1,...,d n I such that d 1 = d, d n = d, and d i+1 is a neighbo of d i fo all 1 i < n. The length of a path is the numbe of elements in it, i.e., the path d 1,...,d n is of length n. The elevant path d 1,...,d l fom d to d is the sub-sequence of d 1,...,d n that is obtained by dopping all elements d i / D. Claim 1. Let (t, t ) subq(q l, t ) fo some t R and let d 1,...,d l be the elevant path fom d = d 1 = π l(t) to d = d l = π l(t ). If l > 2, thee is a ole s Tans R such that s * R and (d i, d i+1 ) si fo all 1 i < l. Poof. Let d 1,...,d n be the path and d 1,...,d l the elevant path fom π l(t) to π l (t ). Then l > 2 implies n > 2. We have to show that thee is a ole s as in the claim. Let J be a foest base fo I. Since I = π l q l, n > 2 implies (π l (t), π l (t )) I \ J. Since I is based on J, it follows that thee is an s Tans R such that s * R, and (d i, d i+1 ) s J fo all 1 i < n. By constuction of I fom J, it follows that (d i, d i+1 ) si fo all 1 i < l, which finishes the poof of the claim. Now let q f be obtained fom q l as follows: fo each ole atom (t, t) subq(q l, t ) with t R, if the length of the elevant path d 1,...,d l fom d = d 1 = π l(t) to d = d l = π l(t ) is geate than 2, then select a ole s and vaiables t j D such that π f (t j ) = d j as in Claim 1 and eplace the atom (t, t ) with s(t 1, t 2 ),..., s(t l 1, t l ), whee t = t 1, t = t l. Please note that these t j can be chosen in a don t cae non-deteministic way since π f is injective modulo *, i.e., if π f (t j ) = d j = π f (t j ), then t j * t j and we can pick any of these. We now have to show that (i) I = π f q f, and (ii) π f is a foest match. Fo (i), let (t, t ) q l \ q f and let s(t 1, t 2 ),...,s(t l 1, t l ) be the atoms that eplaced (t, t ). Since I = π l q l, I = π l (t, t ) and (π l (t), π l (t )) I. Since (t, t ) was eplaced in q f, the length of the elevant path fom π l (t) to π l (t ) is geate than 2. Hence, it must be the case that (π l (t), π l (t )) I \ J. Let d 1,...,d n with d 1 = π l (t) and d n = π l (t ) be the path fom π l (t) to π l (t ) and d 1,...,d l the elevant path fom π l(t) to π l (t ). By constuction of I fom J, this means that thee is a ole s Tans R such that s * R and (d i, d i+1 ) s J fo all 1 i < n. Again by constuction of I, this means (d i, d i+1 ) si fo 1 i < l as equied. Hence I = π f s(t i, t i+1 ) fo each i with i < l by definition of π f. Fo (ii): the mapping π f diffes fom π l only fo the newly intoduced vaiables. Futhemoe, we only intoduced new ole atoms within a sub-quey subq(q l, t ) and π l is a split match by assumption. Hence, π f is tivially a split match and we only have to show that π f is a foest match. Since π f is a split match, we can do this tee by tee. Fo each a Inds(A), let T a := {w (a, w) an(π f )}. We need to constuct a mapping f as specified in Definition 14, and we stat with its oot t. If T a, let t Tems(q) be the unique tem such that π f (t ) = (a, w ) and thee is no t Tems(q) such that π f (t) = (a, w) and w is a pope pefix of w. Such a tem exists since π f is a split match and it is unique due to Fact (a) above. Define a tace to be a sequence w = w 1 w n T + a such that w 1 = w ; 195

Glimm, Hoocks, Lutz, & Sattle fo all 1 i < n, w i is the longest pope pefix of w i+1. Since I is canonical, each w i T a is in IN. It is not had to see that T = { w w is a tace} {ε} is a tee. Fo a tace w = w 1 w n, let Tail( w) = w n. Define a mapping f that maps each tem t with π f (t) = (a, w) T a to the unique tace w t such that w = Tail( w t ). Let (t, t ) q f such that π f (t), π f (t ) T a. By constuction of q f, this implies that the length of the elevant path fom π f (t) to π f (t ) is exactly 2. Thus, f(t) and f(t ) ae neighbos in T and, hence, π f is a foest match as equied. Theoem (19). Let K be a SHIQ knowledge base, q a Boolean conjunctive quey, and {q 1,...,q l } = tees K (q) gound K (q). Then K = q iff K = q 1... q l. Poof of Theoem 19. Fo the if diection: let us assume that K = q 1... q l. Hence, fo each model I of K, thee is a quey q i with 1 i l such that I = q i. We distinguish two cases: (i) q i tees K (q) and (ii) q i gound K (q). Fo (i): q i is of the fom C(v) whee C is the quey concept fo some quey q f w..t. v Vas(q f ) and (q f, ) f K (q). Hence I = π q i fo some match π, and thus I = π C(v). Let d I with d = π(v) C I. By Lemma 12, we then have that I = q f and, by Lemma 18, we then have that I = q as equied. Fo (ii): since q i gound K (q), thee is some pai (q f, R) f K (q) such that q i = gound(q f, R, τ). We show that I = π f q f fo some match π f. Since I = q 1, thee is a match π i such that I = π i q i. We now constuct the match π f. Fo each t R, q i contains a concept atom C(τ(t)) whee C = con(subq(q f, t), t) is the quey concept of subq(q f, t) w..t. t. Since I = π i C(τ(t)) and by Lemma 12, thee is a match π t such that I = πt subq(q f, t). We now define π f as the union of π t, fo each t R. Please note that π f (t) = π i (τ(t)). Since Inds(q f ) R and τ is such that, fo each a Inds(q f ), τ(a) = a and τ(t) = τ(t ) iff t * t, it follows that I = π f at fo each atom at q f such that at contains only tems fom the oot choice R and hence I = π f q f as equied. Fo the only if diection we have to show that, if K = q, then K = q 1... q l, so let us assume that K = q. By Lemma 7 in its negated fom we have that K = q iff all canonical models I of K ae such that I = q. Hence, we can estict ou attention to the canonical models of K. By Lemma 18, I = K and I = q implies that thee is a pai (q f, R) f K (q) such that I = π f q f fo a foest match π f, R is the induced oot splitting of π f, and π f is an injection modulo *. We again distinguish two cases: (i) R =, i.e., the oot splitting is empty and π f is a tee match, and (ii) R, i.e., the oot splitting is non-empty and π f is a foest match but not a tee match. Fo (i): since (q f, ) f K (q), thee is some v Tems(q f ) such that C = con(q f, v) and q i = C(v). By Lemma 12 and, since I = q f, thee is an element d I such that d C I. Hence I = π C(v) with π : v d as equied. Fo (ii): since R is the oot splitting induced by π f, fo each t R thee is some a t Inds(A) such that π f (t) = (a t, ε). We now define the mapping τ : R Inds(A) as follows: fo each t R, τ(t) = a t iff π f (t) = (a t, ε). By definition of gound(q f, R, τ), q i = gound(q f, R, τ) gound K (q). Since I = π f q f, I = subq(q f, t) fo each t R. 196

Conjunctive Quey Answeing fo the DL SHIQ Since q f is foest-shaped, each subq(q f, t) is tee-shaped. Then, by Lemma 12, I = q i, whee q i is the quey obtained fom q f by eplacing each sub-quey subq(q f, t) with C(t) fo C the quey concept of subq(q f, t) w..t. t. By definition of τ fom the foest match π f, it is clea that I = gound(q f, R, τ) as equied. Lemma (20). Let q be a Boolean conjunctive quey, K = (T, R, A) a SHIQ knowledge base, q := n and K := m. Then thee is a polynomial p such that 1. (co(q)) 2 p(n) and, fo each q co(q), q p(n), 2. (s K (q)) 2 p(n) log p(m), and, fo each q s K (q), q p(n), 3. (l K (q)) 2 p(n) log p(m), and, fo each q l K (q), q p(n), 4. (f K (q)) 2 p(n) log p(m), and, fo each q f K (q), q p(n), 5. (tees K (q)) 2 p(n) log p(m), and, fo each q tees K (q), q p(n), and 6. (gound K (q)) 2 p(n) log p(m), and, fo each q gound K (q), q p(n). Poof of Lemma 20. 1. The set co(q) contains those queies obtained fom q by adding at most n equality atoms to q. The numbe of collapsings coesponds, theefoe, to building all equivalence classes ove the tems in q by *. Hence, the cadinality of the set co(q) is at most exponential in n. Since we add at most one equality atom fo each pai of tems, the size of a quey q co(q) is at most n + n 2, and q is, theefoe, polynomial in n. 2. Fo each of the at most n ole atoms, we can choose to do nothing, eplace the atom with two atoms, o with thee atoms. Fo evey eplacement, we can choose to intoduce a new vaiable o e-use one of the existing vaiables. If we intoduce a new vaiable evey time, the new quey contains at most 3n tems. Since K can contain at most m non-simple oles that ae a sub-ole of a ole used in ole atoms of q, we have at most m oles to choose fom when eplacing a ole atom. Oveall, this gives us at most 1 + m(3n) + m(3n)(3n) choices fo each of the at most n ole atoms in a quey and, theefoe, the numbe of split ewitings fo each quey q co(q) is polynomial in m and exponential in n. In combination with the esults fom (1), this also shows that the oveall numbe of split ewitings is polynomial in m and exponential in n. Since we add at most two new ole atoms fo each of the existing ole atoms, the size of a quey q s K (q) is linea in n. 3. Thee ae at most n ole atoms of the fom (t, t) in a quey q s K (q) that could give ise to a loop ewiting, at most m non-simple sub-oles of in K that can be used in the loop ewiting, and we can intoduce at most one new vaiable fo each ole atom (t, t). Theefoe, fo each quey in s K (q), the numbe of loop ewitings is again polynomial in m and exponential in n. Combined with the esults fom (2), this bound also holds fo the cadinality of l K (q). In a loop ewiting, one ole atom is eplaced with two ole atoms, hence, the size of a quey q l K (q) at most doubles. 197

Glimm, Hoocks, Lutz, & Sattle 4. We can use simila aguments as above in ode to deive a bound that is exponential in n and polynomial in m fo the numbe of foest ewitings in f K (q). Since the numbe of ole atoms that we can intoduce in a foest ewiting is polynomial in n, the size of each quey q f K (q) is at most quadatic in n. 5. The cadinality of the set tees K (q) is clealy also polynomial in m and exponential in n since each quey in f K (q) can contibute at most one quey to the set tees K (q). It is not had to see that the size of a quey q tees K (q) is polynomial in n. 6. By (1)-(4) above, the numbe of tems in a oot splitting is polynomial in n and thee ae at most m individual names occuing in A that can be used fo the mapping τ fom tems to individual names. Hence the numbe of diffeent gound mappings τ is at most polynomial in m and exponential in n. The numbe of gound queies that a single tuple (q f, R) f K (q) can contibute is, theefoe, also at most polynomial in m and exponential in n. Togethe with the bound on the numbe of foest ewitings fom (4), this shows that the cadinality of gound K (q) is polynomial in m and exponential in n. Again it is not had to see that the size of each quey q gound K (q) is polynomial in n. Lemma (23). Let K be a SHIQ knowledge base and q a union of connected Boolean conjunctive queies. The algoithm fom Definition 22 answes K entails q iff K = q unde the unique name assumption. Poof of Lemma 23. Fo the only if -diection: let q = q 1... q l. We show the contapositive and assume that K = q. We can assume that K is consistent since an inconsistent knowledge base tivially entails evey quey. Let I be a model of K such that I = q. We show that I is also a model of some extended knowledge base K q = (T T q, R, A A q ). We fist show that I is a model of T q. To this end, let C in T q. Then C(v) T and C = con(q f, v) fo some pai (q f, ) f K (q 1 )... f K (q l ) and v Vas(q f ). Let i be such that (q f, ) f K (q i ). Now C I implies, by Lemma 12, that I = q f and, by Lemma 18, I = q i and, hence, I = q, contadicting ou assumption. Thus I = C and, thus, I = T q. Next, we define an extended ABox A q such that, fo each q G, if C(a) q and a I C I, then C(a) A q ; if (a, b) q and (a I, b I ) / I, then (a, b) A q. Now assume that we can have a quey q = gound(q f, R, τ) gound K (q 1 )... gound K (q l ) such that thee is no atom at q with at A q. Then tivially I = q. Let i be such that (q f, R) f K (q i ). By Theoem 19, I = q i and thus I = q, which is a contadiction. Hence K q is an extended knowledge base and I = K q as equied. Fo the if -diection, we assume that K = q, but the algoithm answes K does not entail q. Hence thee is an extended knowledge base K q = (T T q, R, A A q ) that is consistent, i.e., thee is a model I such that I = K q. Since K q is an extension of K, 198

Conjunctive Quey Answeing fo the DL SHIQ I = K. Moeove, we have that I = T q and hence, fo each d I, d C I fo each C(v) tees K (q 1 )... tees K (q l ). By Lemma 12, we then have that I = q fo each q tees K (q 1 )... tees K (q l ) and, by Lemma 18, I = q i fo each i with 1 i l. By definition of extended knowledge bases, A q contains an assetion at fo at least one atom at in each quey q = gound(q f, R, τ) fom gound K (q 1 )... gound K (q l ). Hence I = q fo each q gound K (q 1 )... gound K (q l ). Then, by Theoem 19, I = q, which contadicts ou assumption. Lemma (25). Let R be a ole hieachy, and 1,..., n oles. Fo evey intepetation I such that I = R, it holds that ( ( 1... n, R)) I = ( 1... n ) I. Poof of Lemma 25. The poof is a staightfowad extension of Lemma 6.19 by Tobies (2001). By definition, ( 1... n, R) = ( 1, R)... ( n, R) and, by definition of the semantics of ole conjunctions, we have that ( ( 1, R)... ( n, R)) I = ( 1, R) I... ( n, R) I. If s * R, then {s * Rs } {s s * Rs } and hence (s, R) I (, R) I. If I = R, then I s I fo evey s with * Rs. Hence, (, R) I = I and ( ( 1... n, R)) I = ( ( 1, R)... ( n, R)) I = ( 1, R) I... ( n, R) I = 1 I... n I = ( 1... n ) I as equied. Lemma (28). Given a SHIQ knowledge base K = (T, R, A) whee m := K and the size of the longest ole conjunction is n, we can decide consistency of K in deteministic time 2 p(m)2p(n) with p a polynomial. Poof of Lemma 28. We fist tanslate K into an ALCQIb knowledge base t(k, R) = (t(t, R), t(a, R)). Since the longest ole conjunction is of size n, the cadinality of each set tc(r, R) fo a ole conjunction R is bounded by m n. Hence, the TBox t(t, R) can contain exponentially many axioms in n. It is not had to check that the size of each axiom is polynomial in m. Since deciding whethe an ALCQIb KB is consistent is an ExpTimecomplete poblem (even with binay coding of numbes) (Tobies, 2001, Theoem 4.42), the consistency of t(k, R) can be checked in time 2 p(m)2p(n). Lemma (29). Let K = (T, R, A) be a SHIQ knowledge base with m := K and q a union of connected Boolean conjunctive queies with n := q. The algoithm given in Definition 22 decides whethe K = q unde the unique name assumption in deteministic time in 2 p(m)2p(n). Poof of Lemma 29. We fist show that thee is some polynomial p such that we have to check at most 2 p(m)2p(n) extended knowledge bases fo consistency and then that each consistency check can be done in time 2 p(m)2p(n), which gives an uppe bound of 2 p(m)2p(n) on the time needed fo deciding whethe K = q. Let q := q 1... q l. Clealy, we can use n as a bound fo l, i.e., l n. Moeove, the size of each quey q i with 1 i l is bounded by n. Togethe with Lemma 20, we get that (T) and (G) ae bounded by 2 p(n) log p(m) fo some polynomial p and it is clea that the sets can be computed in this time bound as well. The size of each quey q G w..t. an ABox A is polynomial in n and, when constucting A q, we can add a subset of (negated) 199

Glimm, Hoocks, Lutz, & Sattle atoms fom each q G to A q. Hence, thee ae at most 2 p(m)2p(n) extended ABoxes A q and, theefoe, 2 p(m)2p(n) extended knowledge bases that have to be tested fo consistency. Due to Lemma 20 (5), the size of each quey q T is polynomial in n. Computing a quey concept C q of q w..t. some vaiable v Vas(q ) can be done in time polynomial in n. Thus the TBox T q can be computed in time 2 p(n) log p(m). The size of an extended ABox A q is maximal if we add, fo each of the 2 p(n) log p(m) gound queies in G, all atoms in thei negated fom. Since, by Lemma 20 (6), the size of these queies is polynomial in n, the size of each extended ABox A q is bounded by 2 p(n) log p(m) and it is clea that we can compute an extended ABox in this time bound as well. Hence, the size of each extended KB K q = (T T q, R, A A q ) is bounded by 2 p(n) log p(m). Since ole conjunctions occu only in T q o A q, and the size of each concept in T q and A q is polynomial in n, the length of the longest ole conjunction is also polynomial in n. When tanslating an extended knowledge base into an ALCQIb knowledge base, the numbe of axioms esulting fom each concept C that occus in T q o A q can be exponential in n. Thus, the size of each extended knowledge base is bounded by 2 p(n) log p(m). Since deciding whethe an ALCQIb knowledge base is consistent is an ExpTimecomplete poblem (even with binay coding of numbes) (Tobies, 2001, Theoem 4.42), it can be checked in time 2 p(m)2p(n) if K is consistent o not. Since we have to check at most 2 p(m)2p(n) knowledge bases fo consistency, and each check can be done in time 2 p(m)2p(n), we obtain the desied uppe bound of 2 p(m)2p(n) fo deciding whethe K = q. Lemma (31). Let K = (T, R, A) be a SHIQ knowledge base and q a union of Boolean conjunctive queies. K = q without making the unique name assumption iff thee is an A-patition K P = (T, R, A P ) and q P w..t. K and q such that K P = q P unde the unique name assumption. Poof of Lemma 31. Fo the only if -diection: Since K = q, thee is a model I of K such that I = q. Let f : Inds(A) Inds(A) be a total function such that, fo each set of individual names {a 1,...,a n } fo which a 1 I = a i I fo 1 i n, f(a i ) = a 1. Let A P and q P be obtained fom A and q by eplacing each individual name a in A and q with f(a). Clealy, K P = (T, R, A P ) and q P ae an A-patition w..t. K and q. Let I P = ( I, IP ) be an intepetation that is obtained by esticting I to individual names in Inds(A P ). It is easy to see that I P = K P and that the unique name assumption holds in I P. We now show that I P = q P. Assume, to the contay of what is to be shown, that I P = π q P fo some match π. We define a mapping π: Tems(q) I fom π such π(a) = π (f(a)) fo each individual name a Inds(q) and π(v) = π (v) fo each vaiable v Vas(q). It is easy to see that I = π q, which is a contadiction. Fo the if -diection: Let I P = ( I, IP ) be such that I P = K P unde UNA and I P = q P and let f : Inds(A) Inds(A P ) be a total function such that f(a) is the individual that eplaced a in A P and q P. Let I = ( I, I) be an intepetation that extends I P such that a I = f(a) IP. We show that I = K and that I = q. It is clea that I = T. Let C(a) be an assetion in A such that a was eplaced with a P in A P. Since I P = C(a P ) and a I = f(a) IP = a P IP C IP, I = C(a). We can use a simila agument fo (possibly 200

Conjunctive Quey Answeing fo the DL SHIQ. negated) ole assetions. Let a = b be an assetion in A such that a was eplaced with a P and b with b P in A P, i.e., f(a) = a P and f(b) = b P. Since I P = a P. = b P, a I = f(a) IP = a P IP b P IP = f(b) IP = b I. and I = a =b as equied. Theefoe, we have that I = K as equied. Assume that I = π q fo a match π. Let π P : Tems(q P ) I be a mapping such that π P (v) = π(v) fo v Vas(q P ) and π P (a P ) = π(a) fo a P Inds(q P ) and some a such that a P = f(a). Let C(a P ) q P be such that C(a) q and a was eplaced with a P, i.e., f(a) = a P. By assumption, π(a) C I, but then π(a) = a I = f(a) IP = a P IP = π P (a P ) C IP and I P = C(a P ). Simila aguments can be used to show entailment fo ole and equality atoms, which yields the desied contadiction. Theoem (35). Let K = (T, R, A) be a SHIQ knowledge base with m := K and q := q 1... q l a union of Boolean conjunctive queies with n := q. The algoithm given in Definition 34 decides in non-deteministic time p(m a ) whethe K = q fo m a := A and p a polynomial. Poof of Theoem 35. Clealy, the size of an ABox A P in an A-patition is bounded by m a. As established in Lemma 32, the maximal size of an extended ABox A P q is polynomial in m a. Hence, A P A P q p(m a ) fo some polynomial p. Due to Lemma 20 and since the size of q, T, and R is fixed by assumption, the sets tees K P(q i ) and gound K P(q i ) fo each i such that 1 i l can be computed in time polynomial in m a. Fom Lemma 29, we know that the tanslation of an extended knowledge base into an ALCQIb knowledge base is polynomial in m a and a close inspection of the algoithm by Tobies (2001) fo deciding consistency of an ALCQIb knowledge base shows that its untime is also polynomial in m a. Refeences Baade, F., Calvanese, D., McGuinness, D. L., Nadi, D., & Patel-Schneide, P. F. (Eds.). (2003). The Desciption Logic Handbook. Cambidge Univesity Pess. Bechhofe, S., van Hamelen, F., Hendle, J., Hoocks, I., McGuinness, D. L., Patel- Schneide, P. F., & Stein, L. A. (2004). OWL web ontology language efeence. Tech. ep., Wold Wide Web Consotium. http://www.w3.og/tr/2004/rec-owl-ef-20040210/. Calvanese, D., De Giacomo, G., Lembo, D., Lenzeini, M., & Rosati, R. (2006). Data complexity of quey answeing in desciption logics. In Dohety, P., Mylopoulos, J., & Welty, C. A. (Eds.), Poceedings of the 10th Intenational Confeence on Pinciples of Knowledge Repesentation and Reasoning (KR 2006), pp. 260 270. AAAI Pess/The MIT Pess. Calvanese, D., De Giacomo, G., Lembo, D., Lenzeini, M., & Rosati, R. (2007). Tactable easoning and efficient quey answeing in desciption logics: The dl-lite family. Jounal of Automated Reasoning, 39(3), 385 429. Calvanese, D., De Giacomo, G., & Lenzeini, M. (1998a). On the decidability of quey containment unde constaints. In Poceedings of the 17th ACM SIGACT-SIGMOD- 201

Glimm, Hoocks, Lutz, & Sattle SIGART Symposium on Pinciples of Database Systems (PODS 1998), pp. 149 158. ACM Pess and Addison Wesley. Calvanese, D., De Giacomo, G., Lenzeini, M., Nadi, D., & Rosati, R. (1998b). Desciption logic famewok fo infomation integation. In Poceedings of the 6th Intenational Confeence on Pinciples of Knowledge Repesentation and Reasoning (KR 1998). Calvanese, D., Eite, T., & Otiz, M. (2007). Answeing egula path queies in expessive desciption logics: An automata-theoetic appoach. In Poceedings of the 22th National Confeence on Atificial Intelligence (AAAI 2007). Chekui, C., & Rajaaman, A. (1997). Conjunctive quey containment evisited. In Poceedings of the 6th Intenational Confeence on Database Theoy (ICDT 1997), pp. 56 70, London, UK. Spinge-Velag. Glimm, B., Hoocks, I., & Sattle, U. (2006). Conjunctive quey answeing fo desciption logics with tansitive oles. In Poceedings of the 19th Intenational Wokshop on Desciption Logics (DL 2006). http://www.cs.man.ac.uk/~glimmbx/download/glhs06a.pdf. Gädel, E. (2001). Why ae modal logics so obustly decidable?. In Paun, G., Rozenbeg, G., & Salomaa, A. (Eds.), Cuent Tends in Theoetical Compute Science, Enteing the 21th Centuy, Vol. 2, pp. 393 408. Wold Scientific. Gahne, G. (1991). Poblem of Incomplete Infomation in Relational Databases. Spinge- Velag. Hoocks, I., Patel-Schneide, P. F., & van Hamelen, F. (2003). Fom SHIQ and RDF to OWL: The making of a web ontology language. Jounal of Web Semantics, 1(1), 7 26. Hoocks, I., Sattle, U., Tessais, S., & Tobies, S. (1999). Quey containment using a DLR ABox. Ltcs-epot LTCS-99-15, LuFG Theoetical Compute Science, RWTH Aachen, Gemany. Available online at http://www-lti.infomatik.wth-aachen. de/foschung/repots.html. Hoocks, I., Sattle, U., & Tobies, S. (2000). Reasoning with Individuals fo the Desciption Logic SHIQ. In McAlleste, D. (Ed.), Poceedings of the 17th Intenational Confeence on Automated Deduction (CADE 2000), No. 1831 in Lectue Notes in Atificial Intelligence, pp. 482 496. Spinge-Velag. Hoocks, I., & Tessais, S. (2000). A conjunctive quey language fo desciption logic aboxes. In Poceedings of the 17th National Confeence on Atificial Intelligence (AAAI 2000), pp. 399 404. Hustadt, U., Motik, B., & Sattle, U. (2005). Data complexity of easoning in vey expessive desciption logics. In Poceedings of the Intenational Joint Confeence on Atificial Intelligence (IJCAI 2005), pp. 466 471. Levy, A. Y., & Rousset, M.-C. (1998). Combining hon ules and desciption logics in CARIN. Atificial Intelligence, 104(1 2), 165 209. Lutz, C. (2007). Invese oles make conjunctive queies had. In Poceedings of the 20th Intenational Wokshop on Desciption Logics (DL 2007). 202

Conjunctive Quey Answeing fo the DL SHIQ McGuinness, D. L., & Wight, J. R. (1998). An industial stength desciption logic-based configuation platfom. IEEE Intelligent Systems, 13(4). Motik, B., Sattle, U., & Stude, R. (2004). Quey answeing fo OWL-DL with ules. In Poceedings of the 3d Intenational Semantic Web Confeence (ISWC 2004), Hioshima, Japan. Otiz, M., Calvanese, D., & Eite, T. (2006a). Data complexity of answeing unions of conjunctive queies in SHIQ. In Poceedings of the 19th Intenational Wokshop on Desciption Logics (DL 2006). Otiz, M. M., Calvanese, D., & Eite, T. (2006b). Chaacteizing data complexity fo conjunctive quey answeing in expessive desciption logics. In Poceedings of the 21th National Confeence on Atificial Intelligence (AAAI 2006). Rosati, R. (2006a). DL+log: Tight integation of desciption logics and disjunctive datalog. In Poceedings of the Tenth Intenational Confeence on Pinciples of Knowledge Repesentation and Reasoning (KR 2006), pp. 68 78. Rosati, R. (2006b). On the ddecidability and finite contollability of quey pocessing in databases with incomplete infomation. In Poceedings of the 25th ACM SIGACT SIGMOD Symposium on Pinciples of Database Systems (PODS-06), pp. 356 365. ACM Pess and Addison Wesley. Rosati, R. (2007a). The limits of queying ontologies. In Poceedings of the Eleventh Intenational Confeence on Database Theoy (ICDT 2007), Vol. 4353 of Lectue Notes in Compute Science, pp. 164 178. Spinge-Velag. Rosati, R. (2007b). On conjunctive quey answeing in EL. In Poceedings of the 2007 Desciption Logic Wokshop (DL 2007). CEUR Wokshop Poceedings. Schaef, A. (1993). On the complexity of the instance checking poblem in concept languages with existential quantification. Jounal of Intelligent Infomation Systems, 2(3), 265 278. Siin, E., & Pasia, B. (2006). Optimizations fo answeing conjunctive abox queies. In Poceedings of the 19th Intenational Wokshop on Desciption Logics (DL 2006). Siin, E., Pasia, B., Cuenca Gau, B., Kalyanpu, A., & Katz, Y. (2006). Pellet: A pactical OWL-DL easone. Accepted fo the Jounal of Web Semantics, Available online at http://www.mindswap.og/papes/pelletjws.pdf. Tessais, S. (2001). Questions and answes: easoning and queying in Desciption Logic. PhD thesis, Univesity of Mancheste. Tobies, S. (2001). Complexity Results and Pactical Algoithms fo Logics in Knowledge Repesentation. PhD thesis, RWTH Aachen. Tsakov, D., & Hoocks, I. (2006). FaCT++ desciption logic easone: System desciption. In Fubach, U., & Shanka, N. (Eds.), Poceedings of the Thid Intenational Joint Confeence on Automated Reasoning (IJCAR 2006), Vol. 4130 of Lectue Notes in Compute Science, pp. 292 297. Spinge-Velag. 203

Glimm, Hoocks, Lutz, & Sattle van de Meyden, R. (1998). Logical appoaches to incomplete infomation: A suvey. In Logics fo Databases and Infomation Systems, pp. 307 356. Kluwe Academic Publishes. Vadi, M. Y. (1997). Why is modal logic so obustly decidable?. In Desciptive Complexity and Finite Models: Poceedings of a DIMACS Wokshop, Vol. 31 of DIMACS: Seies in Discete Mathematics and Theoetical Compute Science, pp. 149 184. Ameican Mathematical Society. Wessel, M., & Mölle, R. (2005). A high pefomance semantic web quey answeing engine. In Poceedings of the 18th Intenational Wokshop on Desciption Logics. Wolstencoft, K., Bass, A., Hoocks, I., Lod, P., Sattle, U., Tui, D., & Stevens, R. (2005). A Little Semantic Web Goes a Long Way in Biology. In Poceedings of the 2005 Intenational Semantic Web Confeence (ISWC 2005). 204