On Querying Incomplete Information in Databases under Bag Semantics

On Querying Incomplete Information in Databases under Bag Semantics

Marco Console, Paolo Guagliardo, Leonid Libkin

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 993-999. https://doi.org/10.24963/ijcai.2017/138

Querying incomplete data is an important task both in data management, and in many AI applications that use query rewriting to take advantage of relational database technology. Usually one looks for answers that are certain, i.e., true in every possible world represented by an incomplete database. For positive queries, expressed either in positive relational algebra or as unions of conjunctive queries, finding such answers can be done efficiently when databases and query answers are sets. Real-life databases however use bag, rather than set, semantics. For bags, instead of saying that a tuple is certainly in the answer, we have more detailed information: namely, the range of the numbers of occurrences of the tuple in query answers. We show that the behavior of positive queries is different under bag semantics: finding the minimum number of occurrences can still be done efficiently, but for maximum it becomes intractable. We use these results to investigate approximation schemes for computing certain answers to arbitrary first-order queries that have been proposed for set semantics. One of them cannot be adapted to bags, as it relies on the intractable maxima of occurrences, but another scheme only deals with minima, and we show how to adapt it to bag semantics without losing efficiency.
Keywords:
Knowledge Representation, Reasoning, and Logic: Computational Complexity of Reasoning
Multidisciplinary Topics and Applications: Database Systems
Knowledge Representation, Reasoning, and Logic: Logics for Knowledge Representation