Sensitivity analysis and explanations for robust query evaluation in probabilistic databases

TitleSensitivity analysis and explanations for robust query evaluation in probabilistic databases
Publication TypeConference Papers
Year of Publication2011
AuthorsKanagal B, Li J, Deshpande A
Conference NameProceedings of the 2011 international conference on Management of data
Date Published2011///
Abstract

Probabilistic database systems have successfully established them-selves as a tool for managing uncertain data. However, much of the
research in this area has focused on efficient query evaluation and
has largely ignored two key issues that commonly arise in uncer-
tain data management: First, how to provide explanations for query
results, e.g., “Why is this tuple in my result?” or “Why does this
output tuple have such high probability?”. Second, the problem of
determining the sensitive input tuples for the given query, e.g., users
are interested to know the input tuples that can substantially alter
the output, when their probabilities are modified (since they may
be unsure about the input probability values). Existing systems pro-
vide the lineage/provenance of each of the output tuples in addition
to the output probabilities, which is a boolean formula indicating
the dependence of the output tuple on the input tuples. However,
lineage does not immediately provide a quantitative relationship
and it is not informative when we have multiple output tuples. In
this paper, we propose a unified framework that can handle both
the issues mentioned above to facilitate robust query processing.
We formally define the notions of influence and explanations and
provide algorithms to determine the top-l influential set of variables
and the top-l set of explanations for a variety of queries, including
conjunctive queries, probabilistic threshold queries, top-k queries
and aggregation queries. Further, our framework naturally enables
highly efficient incremental evaluation when input probabilities are modified (e.g., if uncertainty is resolved). Our preliminary exper-
imental results demonstrate the benefits of our framework for per-
forming robust query processing over probabilistic databases.