Data Complexity of Reasoning in Very Expressive Description Logics Ullrich Hustadt Boris Motik Ulrike Sattler Dept. of Computer Science Forschungszentrum Informatik Dept. of Computer Science Univ. of Liverpool, UK Univ. of Karlsruhe, Germany Univ. of Manchester, UK U.Hustadt@csc.liv.ac.uk motik@fzi.de sattler@cs.man.ac.uk Abstract reasoning problems are P-complete in |A|. To develop an intuition and to provide a more detailed account of these re- Data complexity of reasoning in description logics sults, we compare them with similar results for (variants of) (DLs) estimates the performance of reasoning al- datalog [Dantsin et al., 2001]. gorithms measured in the size of the ABox only. Our results are important since they provide a formal justi- We show that, even for the very expressive DL fication for hoping to provide tractable algorithms for very ex- S HI Q, satisfiability checking is data complete for pressive description logics. Furthermore, Horn-S HI Q sub- NP. For applications with large ABoxes, this can sumes DL-lite [Calvanese et al., 2004], a logic aiming to be a more accurate estimate than the usually con- capture most constructs of ER and UML formalisms, while sidered combined complexity, which is E X P T I M E- providing polynomial algorithms for satisfiability checking complete. Furthermore, we identify an expressive and conjunctive query answering (assuming an unbounded fragment, Horn-S HI Q, which is data complete for knowledge base, but a bound on the query size). Horn- P, thus being very appealing for practical usage. S HI Q additionally allows qualified existential quantifica- tion, conditional functionality and role inclusions, while al- 1 Introdcution lowing for a reasoning algorithm that runs in time polynomial in the size of data. Description logics (DLs) [Baader et al., 2003] are state-of- In [Schaerf, 1994], the complexity of concept subsump- the-art knowledge representation formalisms with applica- tion was contrasted with combined and data complexity of tions in many areas of computer science. Very expressive DLs instance checking for various fragments of ALC . This work such as S HI Q are interesting mainly due to their high ex- provides a lower bound for the data complexity of reasoning pressivity combined with the clearly defined model-theoretic in S HI Q. semantics and known formal properties, such as the compu- tational complexity of reasoning. In particular, the combined 2 Preliminaries complexity of checking satisfiability of a S HI Q knowl- edge base KB is E X P T I M E-complete in |KB | [Schild, 1991; Description Logics. Given a set of role names NR , a Tobies, 2001]. S HI Q role is either some R NR or an inverse role R- for R NR . A S HI Q RBox R is a finite set of role inclu- E X P T I M E-completeness is a rather discouraging result sion axioms R S and transitivity axioms Trans(R), for R since |KB | can be large in practice. However, similar to a and S S HI Q roles. For R NR , we set Inv(R) = R- and database, KB consists of a schema part T , called the TBox, Inv(R- ) = R, and assume that R S R (Trans(R) R) and a data or fact part A, called the ABox. For applications implies Inv(R) Inv(S ) R (Trans(Inv(R)) R). A role with a fixed schema but varying data part, data complexity, R is said to be simple if Trans(S ) R, for each S R, measured in the size of A only, provides a more precise per- / where is the reflexive-transitive closure of . formance estimate. Given a set of concept names NC , the set of S HI Q con- Here, we assume that the ABox of KB is extensionally cepts is the minimal set such that each A NC is a S HI Q reduced, i.e. it involves only roles and (possibly negated) concept and, for C and D S HI Q concepts, R a role, S a sim- atomic concepts. Thus, the terminological knowledge is ple role and n a positive integer, then ¬C , C D, R.C and strictly separated from assertional knowledge, so |A| is the n S.C are also S HI Q concepts. We use , , C1 C2 , measure of "raw" data. For such a KB , we show that check- R.C , and n S.C as abbreviations for A ¬A, A ¬A, ing KB satisfiability is NP-complete in |A|, and that in- ¬(¬C1 ¬C2 ), ¬R.¬C , and ¬( (n + 1) S.C ), respec- stance checking is co-NP-complete in |A|. Since this might still lead to intractability, we identify Horn-S HI Q, a frag- tively. Concepts that are not concept names are called com- ment of S HI Q analogous to the Horn fragment of first-order plex. A literal concept is a possibly negated concept name. A TBox T is a finite set of concept inclusion axioms of the logic. Namely, Horn-S HI Q provides existential and univer- form C D. An ABox A is a finite set of axioms C (a), sal quantifiers, but does not provide means to express disjunc- R(a, b), and (in)equalities a b and a b. A knowledge tive information. We show that, for Horn-S HI Q, the basic Table 1: Semantics of S HI Q by Mapping to FOL Table 2: Types of ALC HI Q-clauses ¬R(x, y ) Inv(R)(y , x) 1 Translating Concepts to FOL ¬R(x, y ) S (x, y ) 2 y (A, X ) = A(X ) Pf (x) R(x, f (x)) 3 y (C D, X ) = y (C, X ) y (D, X ) Pf (x) R(f (x), x) Ï 4 y (¬C, X ) = ¬y (C, X ) Ï P1 (x) P2 (f (x)) fi (x)/fj (x) 5 y (R.C, X ) = y : R(X, y ) x (C, y ) y ( n S.C , X ) = Î Î Î P1 (x) P2 (g (x)) P3 (f (g (x))) ti /tj 6 where ti and tj are of the form f (gÏ )) or x Ï (x y1 , . . . , yn : S (X, yi ) x (C, yi ) yi yj 7 P1 (x) ¬R(x, Ï ) P2 (y) yi yj yi Translating Axioms to FOL 8 R(a, b) P(t) ti /tj (C D) = x : y (C, x) y (D, x) where t(i) are a constant b or a functional term fi (a) (R S ) = x, y : R(x, y ) S (x, y ) (Trans(R)) = x, y , z : R(x, y ) R(y , z ) R(x, z ) positive disjunctive datalog program with equality DD(KB ) (C (a)) = y (C, a) (R(a, b)) = R(a, b) which is equisatisfiable with KB . (a b) = a b for {, } A minor obstacle in computing DD(KB ) are the transi- Translating KB to FOL tivity axioms which, in their clausal form, do not contain a Î Î (R) = x, y : R(x, y ) R- (y , x) literal in which all variables of a clause occur. Such clauses (KB ) = RNR (R) T RA () are known to be difficult to handle, so KB is preprocessed X is a meta variable and is substituted by the actual variable. into an equisatisfiable ALC HI Q knowledge base (KB ). x is obtained from y by simultaneously substituting in the Roughly speaking, a transitivity axiom Trans(S ) is replaced definition x(i) for all y(i) , respectively, and y for x . with axioms of the form R.C S.(S.C ), for each R with S R and C a concept occurring in KB . This trans- base KB is a triple (R, T , A). KB is extensionally reduced formation is polynomial, so in the rest of this paper w.l.o.g. if all ABox axioms of KB contain only literal concepts. we assume KB to be an ALC HI Q knowledge base. The semantics of KB is given by translating it into first- The next step is to translate (KB ) into clausal first-order order logic by the operator from Table 1. The main infer- logic. Assuming is defined as in Table 1, ((KB )) is ence problem is checking KB satisfiability, i.e. determining transformed into a set of clauses (KB ) using structural if a first-order model of (KB ) exists. An individual a is transformation to avoid an exponential blowup [Nonnengart an instance of a concept C w.r.t. KB if (KB ) |= y (C, a), and Weidenbach, 2001]. Roughly speaking, the structural which is the case iff KB {¬C (a)} is unsatisfiable. transformation introduces a new name for each complex sub- The logic ALC HI Q is obtained by disallowing transitivity formula of ((KB )). A specialized version of the structural axioms in S HI Q RBoxes, and ALC is obtained by disallow- transformation is presented in detail in Section 4. ing RBoxes, inverse roles and number restrictions. A logic L A core property of (KB ) is that it only contains clauses is between logics L1 and L2 if it contains at least the primi- of one of the forms given in Table 2; such clauses are called tives from L1 and at most the primitives from L2 . ALC HI Q-clauses. For a term t, P(t) denotes a disjunction We measure the size of concepts by their length, and as- of the form (¬)P1 (t) . . . (¬)Pn (t), and P(f (x)) denotes sume unary coding of numbers, i.e. | n R.C | = n + 1 + |C |, a disjunction of the form P1 (f1 (x)) . . . Pn (fm (x)) (no- and use |R(a, b)| = |(¬)A(a)| = 3. tice that this allows each Pi (fi (x)) to contain positive and negative literals). Next, the RBox and TBox clauses of (KB ) are saturated Disjunctive Datalog. A datalog term is a constant or a variable, and a datalog atom has the form A(t1 , . . . , tn ) or by basic superposition [Bachmair et al., 1995; Nieuwenhuis t1 t2 , where ti are datalog terms. A disjunctive datalog and Rubio, 1995]--a clausal equational theorem proving cal- program with equality P is a finite set of rules of the form culus. Due to space limitations, we are unable to present the A1 ... An B1 , ..., Bm where Ai and Bj are datalog rules of basic superposition; it can be considered to be an op- atoms. Each rule is required to be safe, i.e. each variable oc- timized version of the well-known paramodulation calculus. Let be the saturated set of clauses. In this key step, all curring in the rule must occur in at least one body atom. A fact is a rule with m = 0. For the semantics, we take a rule to non-ground consequences of KB are computed. In [Hustadt be equivalent to a clause A1 ... An ¬B1 ... ¬Bm . We et al., 2004b], we have shown the following key property: consider only Herbrand models, and say that a model M of P () an application of a basic superposition inference rule to is minimal if there is no model M of P such that M M . A ALC HI Q-clauses produces an ALC HI Q-clause. The proof ground literal A is a cautious answer of P (written P |=c A) examines all inference rules and clause types. if A is true in all minimal models of P . First-order entailment Furthermore, by examining the types of clauses from Ta- ble 2, one can show the following property: () for a finite coincides with cautious entailment for positive ground atoms. KB , the number of ALC HI Q-clauses unique up to variable renaming is exponential in |KB |. The proof is a straightfor- Reducing KB to Disjunctive Datalog. The results in this ward counting exercise since the number of variables and the paper are based on our algorithm from [Hustadt et al., 2004b]. depth of functional terms in ALC HI Q-clauses are bounded. For a S HI Q knowledge base KB , this algorithm computes a Each inference step can be carried out in polynomial time, so by () and (), after at most exponentially many steps, all complexity of checking satisfiability of a positive disjunctive ALC HI Q-clauses are derived, and saturation terminates. datalog program, and is thus in NP. Intuitively, this is due to nice property of S HI Q that TBox and RBox reasoning does Satisfiability of (KB ) can be decided by further satu- not "interfere" with ABox reasoning, i.e. all non-ground con- rating (A) by basic superposition. Since contains sequences of KB can be computed without taking the ABox all non-ground consequences of (KB ), all remaining infer- into account. Notice that this result holds even for binary ences will produce only ground clauses, and will not involve number coding. clauses of type 4 and 6. These inferences can be simulated Lemma 1 (Membership). For KB an extensionally reduced in a disjunctive datalog program by transforming into a S HI Q knowledge base, satisfiability of KB can be decided function-free clause set, and by introducing new constants in non-deterministic polynomial time in |A|. playing the role of ground functional terms, as described next. We define an operator transforming as follows: (i) each Proof. Let c be the number of constants, f the number of functional term f (a) is replaced with a new, globally unique function symbols in the signature of (KB ), and s the num- constant af ; (ii) each term f (x) is replaced with a new, glob- ber of facts in (KB ) (which is equal to the number of ally unique variable xf ; (iii) for each variable in a clause in- facts in A). By definition of , the number of constants in troduced in step (ii), appends a literal ¬Sf (x, xf ), where DD(KB ) is bounded by 1 = c + cf (cf accounts for con- Sf is a new predicate unique for the function symbol f ; (iv) if stants of the form af ), and the number of facts in DD(KB ) some variable x occurs in a positive, but not in a negative lit- is bounded by 2 = s + c + 2cf (c accounts for facts of the eral in a clause, then the literal ¬HU (x) is appended to the form HU (a), one cf accounts for facts of the form Sf (a, af ), clause; (v) for each function symbol f and each constant a and the other cf accounts for facts of the form HU (af )). All from (KB ), the facts Sf (a, af ), HU (a) and HU (af ) are function symbols are introduced by skolemizing TBox con- added. The set of (function-free) clauses obtained by apply- cepts R.C and n R.C . Since |T | and |R| are constant, f ing to (A) is denoted with FF(KB ). An example of is also a constant, so both 1 and 2 are linear in |A|. applying to a clause of type 5 is shown below. Hence, |DD(KB )| can be exponential in |KB | only be- ¬C (x) D(f (x)) ¬Sf (x, xf ) ¬C (x) D(xf ) cause the non-ground rules in DD(KB ) are obtained from exponentially many clauses of types 1 ­ 7. Since these clause Now each remaining ground inference by basic superposi- types do not contain ABox clauses, the number of clauses ob- tion in (A) can be simulated by a sound inference step in tained after saturation is obviously exponential in |T | + |R| FF(KB ), and vice versa [Hustadt et al., 2004b], so KB and only. Since we assume that the latter is constant, both the FF(KB ) are equisatisfiable. Since FF(KB ) does not contain number of rules in DD(KB ) and their length are bounded functional terms and all its clauses are safe, each clause can by constants, so |DD(KB )| is polynomial in |A|, and can be be rewritten into a positive disjunctive rule; let DD(KB ) be computed from KB in time polynomial in |A|. the resulting set of rules. The following theorem summarizes As KB and DD(KB ) are equisatisfiable, the data com- the properties of DD(KB ): plexity of checking satisfiability of KB follows from the Theorem 1 ([Hustadt et al., 2004b]). For KB an ALC HI Q data complexity of checking satisfiability of DD(KB ), which knowledge base, the following claims hold: (i) KB is un- is NP-complete [Dantsin et al., 2001]: assuming DD(KB ) satisfiable iff DD(KB ) is unsatisfiable; (ii) KB |= iff contains r rules and at most v variables in a rule, the num- DD(KB ) |=c , for of the form A(a) or S (a, b), A a ber of literals in a ground instantiation ground(DD(KB )) concept name, and S a simple role; (iii) KB |= C (a) iff is bounded by r · v + 2 (in each rule, each variable can 1 DD(KB {C Q}) |=c Q(a), for C a complex concept, and be replaced in 1 possible ways). Assuming r and v are Q a new concept name; (iv) the number of rules in DD(KB ) constants, = |ground(DD(KB ))| is polynomial in |A|. is at most exponential in |KB |, the number of literals in each Satisfiability of ground(DD(KB )) can be checked by non- rule is at most polynomial in |KB |, and DD(KB ) can be com- deterministically generating an interpretation of size at most puted in time exponential in |KB |. , and then checking whether it is a model. Both tasks can be performed in polynomial time, so the overall algorithm is Data Complexity of Reasoning in S HI Q 3 obviously non-deterministically polynomial in |A|. For an extensionally reduced S HI Q knowledge base KB , an The hardness of the satisfiability checking problem follows upper bound for the data complexity follows from the reduc- from [Schaerf, 1994, Lemma 4.2.7]. Actually, the lemma tion of KB to DD(KB ). Before presenting the details, we shows co-NP-hardness of instance checking, by a reduction first discuss the intuition behind this result. of satisfiability of 2-2-CNF propositional formulae. The re- By Theorem 1, |DD(KB )| is exponential in |KB |. How- duction produces an extensionally reduced ABox and a single ever, a closer inspection of the reduction algorithm reveals TBox axiom, so it applies in our case as well. Hence, we im- that the exponential blowup is caused by the rules obtained mediately obtain the following result: by saturating ALC HI Q-clauses of types 1 ­ 7, which corre- Theorem 2. Let KB be an extensionally reduced knowledge spond to TBox and RBox, but not to ABox clauses. Hence, base in any logic between ALC and S HI Q. Then (i) de- the size of the rules of DD(KB ) is exponential in the size of ciding KB satisfiability is data complete for NP and (ii) de- TBox and RBox; however, the size of the facts in DD(KB ) ciding whether KB |= (¬)C (a) with |C | bounded is data is linear in the size of the ABox. Therefore, data complex- complete for co-NP. ity of checking satisfiability of DD(KB ) corresponds to data A Horn Fragment of S HI Q 4 Table 3: Definitions of pl+ and pl- Horn logic is a well-known fragment of first-order logic pl+ (D) pl- (D) D where formulae are restricted to clauses containing at most 0 0 one positive literal. The main limitation of Horn logic is its 0 0 inability to represent disjunctive information; however, its A 1 0 È main benefit is the existence of practical refutation proce- pl+ (C ) pl- (C ) ¬CC C È maxi sgn(pl+ (Ci )) dures. Furthermore, data complexity of query answering in - i sgn(pl (Ci )) i + maxi sgn(pl- (Ci )) Horn logic without function symbols is P-complete [Dantsin i sgn(pl (Ci )) i sgn(pl- (C )) 1 R.C et al., 2001], which makes it appealing for practical usage. sgn(pl+ (C )) 1 R.C Following this idea, in this section we identify a Horn frag- (n-1)n + n sgn(pl+ (C )) n R.C 1 ment of S HI Q, where disjunction is traded for P-complete 2 n(n+1) + (n + 1)sgn(pl- (C )) 1 n R.C data complexity. Roughly speaking, in Horn-S HI Q, only C 2 i D are allowed, where each Ci has axioms of the form the form A or R.A, and D has the form A, , R.A, R.A, It is easy to see that, for a concept C without complex sub- n R.A or 1 R. Whereas such a definition succinctly concepts, pl+ (C ) yields the maximal number of positive lit- demonstrates the expressivity of the fragment, in general it is erals in clauses obtained by clausifying x : y (C, x). To too restricting: e.g., the axiom A1 A2 ¬B is not Horn, but clausify a concept C containing a complex subconcept at a it is equivalent to Horn axioms A1 B and A2 B . position p, one should consider if C |p occurs in C under Similarly, a non-Horn axiom A R.(R.B ) can be trans- positive or negative polarity. E.g., in ¬(¬A ¬B ) the con- formed into Horn axioms A R.Q and Q R.B by in- cepts A and B occur effectively positive, and is effectively troducing a new name Q for the subconcept R.B . To avoid . Hence, pl+ (C |p ) (pl- (C |p )) counts the number of pos- dependency on such obvious syntactic transformations, we itive literals used to clausify C |p , provided that C |p occurs give a rather technical definition of Horn-S HI Q. in C under positive (negative) polarity. The function sgn(·) We first adapt the notions of positions and polarity in first- takes into account that C |p will be replaced in C by structural order formulae to DL. A position p is a finite sequence of transformation with only one concept name, even if clausifi- integers; the empty position is denoted with . If a position cation of C |p produces more than one positive literal. E.g., p1 is a proper prefix of a position p2 , then and p1 is above p2 , to clausify C = R.(D1 D2 ), the structural transforma- and p2 is below p1 . For a concept , the subterm at a position tion replaces D1 D2 with an new concept name Q, yielding p, written |p , is defined as follows: | = ; (¬D)|1p = C = R.Q; then clausifying C produces a clause with only D|p ; (D1 D2 )|ip = Di |p for {, } and i {1, 2}; one positive literal. Now a concept C is Horn if the maximal |1 = R and |2p = D|p for = R.D and {, }; and number of positive literals obtained by clausifying subcon- |1 = n, |2 = R and |3p = D|p for = n R.D and cepts of C is at most one. {, }. A replacement of a subterm of at position p If a concept C has a complex subconcept at position p, spe- with a term is defined in the standard way and is denoted cial care has to be taken in introducing a new name for C |p . as [ ]p . For a concept and a position p such that |p is Consider the Horn concept C = R.D1 R.¬D2 . To ap- a concept, the polarity of |p in , denoted as pol(, p), is ply structural transformation to C , one might replace R.D1 defined as follows: and R.¬D2 with new concept names Q1 and Q2 , yielding pol(C, ) = 1; concepts ¬Q1 R.D1 , ¬Q2 R.¬D2 and Q1 Q2 . The pol(C1 C2 , ip) = pol(Ci , p) for {, }, i {1, 2}; problem with such a straight-forward application of structural pol(R.C, 2p) = pol(C, p) for {, }; transformation is that a Horn concept C was reduced to a non- pol( n R.C, 3p) = pol(C, p); Horn concept Q1 Q2 , so the structural transformation de- pol(¬C, 1p) = -pol(C, p); stroyed Horn-ness. To remedy this, we modify the structural pol( n R.C, 3p) = -pol(C, p). transformation to replace each C |p with a literal concept such that clausifying and C |p requires the same number Intuitively, pol(, p) equals 1 if |p occurs in under an even of positive literals. In the above example, this would mean number of explicit and implicit negations, and -1 otherwise. that R.D1 should be replaced with Q1 , but R.¬D2 should Definition 1. In Table 3, we define two mutually recursive be replaced with ¬Q2 , yielding concepts ¬Q1 R.D1 , functions pl+ and pl- , where sgn(0) = 0 and sgn(n) = 1 for Q2 R.¬D2 and Q1 ¬Q2 , which are all Horn. n > 0. For a concept C and a position p of a subconcept in C , Although transitivity axioms are translated by into Horn let pl(C, p) = pl+ (C |p ) if pol(C, p) = 1, and let pl(C, p) = clauses, recall that the algorithm from Section 2 replaces pl- (C |p ) if pol(C, p) = -1. them with axioms of the form R.C S.(S.C ). Now pl+ (R.¬C S.(S.C )) = 1 + pl+ (C ), so if pl+ (C ) > 0, A concept C is a Horn concept if pl(C, p) 1 for each position p of a subconcept in C (including the empty position (KB ) is not a Horn knowledge base. Hence, the presence ). An extensionally reduced ALC HI Q knowledge base KB of transitivity axioms can make a knowledge base non-Horn. is Horn if, for each axiom C D KB , the concept ¬C D Definition 2. Let C be a concept and a function mapping is Horn. An extensionally reduced S HI Q knowledge base C to the set of positions p = of subconcepts of C such that KB is Horn if (KB ) is Horn. C |p is not a literal concept and, for all positions q below p, C|q is a literal concept. Then Def (C ) is defined recursively Lemma 3. If all premises of an inference by basic super- as follows, where = Q if pl(C, p) > 0, and = ¬Q if position contain at most one positive literal, then inference pl(C, p) = 0, with Q a new atomic concept, and ¬(¬Q) = Q: conclusions also contain at most one positive literal. · Def (C ) = {C } if (C ) = , or Proof. (Sketch) Consider a resolution inference with clauses A C and ¬B D, where all literals in C are negative and · if (C ) = , then choose some p (C ) and let { at most one literal in D is positive. Obviously, the number of positive literals in the conclusion C D is equal to ¬ C |p } Def (C []p ) if pol(C, p) = 1 Def (C ) = the number of positive literals in D. Similarly, consider a {¬ ¬C |p } Def (C [¬]p ) if pol(C, p) = -1 paramodulation inference from a clause s t C into a clause A D, where all literals in C and D are negative. Let Cls() denote the set of clauses obtained by clausifying Obviously, the conclusion A [t ]p C D has only one a formula in the standard way and let D positive literal. Similar considerations hold for a paramodu- Def (C ) Cls(x : y (D , x)). Cls(C ) = lation into a negative literal. For an ALC HI Q knowledge base KB , (KB ) is the small- By Lemma 2 and 3, if KB is a Horn-S HI Q knowledge est set of clauses such that: (i) for each role name R NRa , base, then DD(KB ) is a non-disjunctive program. This is Cls( (R)) (KB ); (ii) for each RBox or ABox axiom in enough for the following result: KB , Cls( ()) (KB ); (iii) for each TBox axiom C D Theorem 3. For KB an extensionally reduced Horn knowl- in KB , Cls(¬C D) (KB ). edge base in any logic between ALC and S HI Q, deciding By [Nonnengart and Weidenbach, 2001], x : y (C, x) D KB (un)satisfiability, and deciding whether KB |= (¬)C (a) Def (C ) x : y (D , x) are equisatisfiable, so (KB ) and with |C | bounded, is P-complete in |A|. and (KB ) are equisatisfiable as well. Proof. Membership in P is a consequence of the fact that Lemma 2. For a Horn-S HI Q knowledge base KB , each DD(KB ) is a non-disjunctive program, whose satisfiability clause from (KB ) contains at most one positive literal. can be checked in polynomial time [Dantsin et al., 2001]. For hardness, consider the well-known P-complete prob- Proof. We first show the following property (*): for a Horn lem of deciding whether a path from a node a1 to a node concept C , all concepts in Def (C ) are Horn concepts. The an in a graph G exists [Papadimitriou, 1993]. For a graph proof is by induction on the recursion depth in Def . The in- G, let KB G be a knowledge base containing the assertions duction base for (C ) = is obvious. Consider an applica- edge (a, b) and edge (b, a) for each edge a, b in G, the ax- tion of Def (C ), where C is a Horn concept and p a position ioms C (a1 ) and ¬C (an ), and the TBox axiom C edge .C . of a subconcept of C , such that C |p is not a literal concept Obviously, a1 is reachable from an if and only if KB G is un- and, for each position q below p, C |q is a literal concept. satisfiable, thus implying P-completeness of unsatisfiability Observe that in all cases, we have pl+ () = pl(C, p) and checking. The other inference problems can be reduced to pl+ (¬) = 1 - pl(C, p). We now consider two cases, de- unsatisfiability as usual. pending on pol(C, p): · pol(C, p) = 1. Now we have pl+ (¬ C |p ) = 5 Discussion pl+ (¬) + pl+ (C |p ) = pl+ (¬) + pl(C, p) = 1. Fur- To better understand the results from the previous two sec- thermore, pl(C, p) = pl(C []p , p), so C []p is Horn. tions, we contrast them with well-known results for (disjunc- · pol(C, p) = -1. Now we have pl+ (¬ ¬C |p ) = tive) datalog [Dantsin et al., 2001]. Since datalog has been pl+ (¬) + pl- (C |p ) = pl+ (¬) + pl(C, p) = 1. Fur- successfully applied in practice, this analysis gives interest- ing insights into the practical applicability of DLs. thermore, pl(C, p) = pl(C [¬]p , p), so C [¬]p is Horn. Interestingly, the data complexity of datalog variants and of Hence, each application of the operator Def decomposes a corresponding S HI Q fragments coincide. Namely, without Horn concept C into two simpler Horn concepts, so (*) holds. disjunctions, a S HI Q knowledge base and a datalog program Furthermore, for each C |p or ¬C |p in the definition of Def , always have at most one model, which can be computed in each immediate subconcept is a literal. polynomial time. With disjunctions, several models are pos- For D Def (C ), by definition of from Table 1, it is easy sible, and this must be dealt with using reasoning-by-cases. to see that pl+ (D) gives the maximal number of positive lit- Intuitively, one needs to "guess" a model, which increases erals occurring in a clause from Cls(x : y (D, x)). Thus, if data complexity to NP. C is a Horn concept, all clauses from Cls(C ) contain at most The key difference between datalog and DLs is revealed one positive literal. Finally, clauses obtained by translating by considering the effects that various parameters have on the RBox and ABox axioms of (KB ) also contain at most one complexity. For a datalog program P and a ground atom , v positive literal. checking whether P |= can be performed in time O(|P | ), where v is the maximal number of distinct variables in a rule As stated by the following lemma, a basic superposition of P [Vardi, 1995]. Namely, the problem can be solved by inference, when applied to Horn premises, produces a Horn grounding P , i.e. by replacing, in each rule of P , all variables conclusion. The full proof of the lemma is given in [Hustadt with individuals from P in all possible ways. The size of the et al., 2004a]. v Acknowledgments grounding is bounded by |P | , and propositional Horn logic is P-complete, giving the above estimate. Now in general, v We thank Enrico Franconi for a stimulating discussion which is linear in |P |, so the size of the grounding is exponential; lead us to the results presented in this paper. thus, the combined complexity of datalog coincides with the combined complexity of S HI Q. However, in practical ap- References plications v is usually small, so it makes sense to assume it [Baader et al., 2003] F. Baader, D. Calvanese, D. McGuin- is bounded. Under this assumption, datalog actually exhibits ness, D. Nardi, and P. Patel-Schneider, editors. The De- polynomial behavior. scription Logic Handbook. Cambridge University Press, By an analogy, one might try to limit the length of con- 2003. cepts in axioms or the number of variables. For the for- mer, structural transformation can be used to polynomially [Bachmair et al., 1995] L. Bachmair, H. Ganzinger, reduce "long" axioms with complex concepts to "short" ax- C. Lynch, and W. Snyder. Basic Paramodulation. ioms with just elementary concepts. For the latter, we note Information and Computation, 121(2):172­192, 1995. that DLs are closely related to the two-variable fragment of [Calvanese et al., 2004] D. Calvanese, G. De Giacomo, first-order logic: e.g., ALC concepts correspond to first-order M. Lenzerini, R. Rosati, and G. Vetere. DL-Lite: Prac- formulae with only two variables regardless of nesting (see, tical Reasoning for Rich DLs. In Proc. DL 2004, volume e.g., [Baader et al., 2003, ch. 4]). Therefore, the number of 104 of CEUR Workshop Proceedings, 2004. variables in DL axioms is "intrinsically" bounded (assuming [Dantsin et al., 2001] E. Dantsin, T. Eiter, G. Gottlob, and a bound on the numbers occurring in number restrictions). A. Voronkov. Complexity and expressive power of logic Hence, neither restriction actually reduces complexity. We summarize our discussion as follows: assuming a programming. ACM Comp. Surveys, 33(3):374­425, 2001. bound on the axiom length, but not on the number of axioms, [Hustadt et al., 2004a] U. Hustadt, B. Motik, and U. Sattler. satisfiability checking in datalog is (non-deterministically) Reasoning in Description Logics with a Concrete Domain polynomial, but in DLs it is exponential. The reason for this is in the Framework of Resolution. Technical Report 2-8- that DLs such as ALC provide existential and universal quan- 1/04, FZI, Karlsruhe, Germany, February 2004. tification and general inclusion axioms, which can be used to http://www.fzi.de/wim/publikationen.php?id=1144. succinctly encode models with paths of exponential length. [Hustadt et al., 2004b] U. Hustadt, B. Motik, and U. Sattler. The saturation step eliminates function symbols introduced Reducing S HI Q- Description Logic to Disjunctive Dat- by existential quantification, but it also incurs an exponential alog Programs. In Proc. KR2004, pages 152­162. AAAI blowup in the program size to account for such paths. Hence, Press, 2004. although combined complexity of both datalog and DLs is exponential, the reasons for this are different. [Nieuwenhuis and Rubio, 1995] R. Nieuwenhuis and A. Ru- In [Baader et al., 2003, ch. 5] two sources of complexity in bio. Theorem Proving with Ordering and Equality Con- DLs have been identified: OR-branching caused by the exis- strained Clauses. Journal of Logic and Computation, tence of several possible models, and AND-branching caused 19(4):312­351, April 1995. by the existence of paths within a model. Our results show [Nonnengart and Weidenbach, 2001] A. Nonnengart and that OR-branching is not so "bad" as AND-branching: the C. Weidenbach. Computing Small Clause Normal Forms. former incurs "only" an increase to NP, whereas the latter In A. Robinson and A. Voronkov, editors, Handbook incurs an increase in complexity to E X P T I M E. of Automated Reasoning, volume I, chapter 6, pages 335­367. Elsevier Science, 2001. 6 Conclusion [Papadimitriou, 1993] C. H. Papadimitriou. Computational In many application of DLs, the TBox can be assumed to Complexity. Addison-Wesley, 1993. be rather stable -- like a database schema -- whereas the [Schaerf, 1994] A. Schaerf. Query Answering in Concept- ABox is varying and possibly very large -- like a database Based Knowledge Representation Systems: Algorithms, extension. Hence, we study the complexity of reasoning in ` Complexity, and Semantic Issues. PhD thesis, Universita expressive DLs measured in the size of the ABox. In partic- degli studi di Roma "La Sapienza", 1994. ular, we show that checking satisfiability of a S HI Q knowl- [Schild, 1991] K. Schild. A correspondence theory for termi- edge base is NP-complete, and that checking unsatisfiability and instance checking are co-NP-complete in the size of the nological logics: preliminary report. In Proc. IJCAI `91, ABox. Furthermore, we identify Horn-S HI Q, a fragment pages 466­471, Sidney, Australia, 1991. of S HI Q which, analogously to Horn logic, does not allow [Tobies, 2001] S. Tobies. Complexity Results and Practical to represent disjunctive knowledge, and for which the basic Algorithms for Logics in Knowledge Representation. PhD reasoning problems are P-complete in the size of theABox. thesis, RWTH Aachen, Germany, 2001. Our results indicate that reasoning with large ABoxes may [Vardi, 1995] M. Y. Vardi. On the Complexity of Bounded- be feasible if the TBox is not "too" large and if we stay within Variable Queries. In Proc. PODS 1995, pages 266­276. Horn-S HI Q. To verify these assumptions, we are currently ACM Press, 1995. implementing our algorithms and plan to conduct a thorough performance analysis.