1 / 32

Representing and Querying Correlated Tuples in Probabilistic Databases

Representing and Querying Correlated Tuples in Probabilistic Databases. Prithviraj Sen Amol Deshpande. outline. General Info Introduction Independent tuples model Tuple correlations Representing Dependencies Query evaluation Experiments Conclusions & Work to be done. General info.

dinesh
Download Presentation

Representing and Querying Correlated Tuples in Probabilistic Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Representing and Querying Correlated Tuples in Probabilistic Databases Prithviraj Sen Amol Deshpande

  2. outline • General Info • Introduction • Independent tuples model • Tuple correlations • Representing Dependencies • Query evaluation • Experiments • Conclusions & Work to be done

  3. General info • High demand for storing uncertain data Issues with the use of probabilistic databases 1)existent probabilistic databases make simplistic assumptions about the data that make it difficult to use them in applications that naturally produce correlated data 2)Most probabilistic databases can only answer a restricted subset of the queries that can be expressed using traditional query languages A framework that can represent not only probabilistic tuples but also correlations among them to tackle these limitations

  4. outline • General Info • Introduction • Independent tuples model • Tuple correlations • Probabilistic graphical models & factored representations • Representing Dependencies • Query evaluation • Experiments • Conclusions & Work to be done

  5. Introduction (1/2) • Database research has primarily concentrated on how to store and query exact data • Many real-world applications produce large amounts of uncertain data Databases need to do more than simply store and retrieve; they have to help the user sift through the uncertainty and find the results most likely to be the answer.

  6. Introduction (2/2) • Numerous approaches (models) proposed to handle uncertainty. • However, most models make assumptions about data uncertainty that restricts applicability (they cannot easily model or handle dependencies and correlations among tuples)

  7. outline • General Info • Introduction • Independent tuples model • Tuple correlations • Probabilistic graphical models & factored representations • Representing Dependencies • Query evaluation • Experiments • Conclusions & Work to be done

  8. Independent tuples model(1/2) • One of the most commonly used tuple-level uncertainty models, associates existence probabilities with individual tuples and assumes that the tuples are independent of each other

  9. Independent tuples model (2/2) • Evaluating a query via the set of possible worlds is clearly intractable as the number of possible worlds is very big • Intensionalsemantics guarantee results in accordance • with possible words semantics but are computationally • expensive. • Extensional semantics are computationally cheaper but do not guarantee results in accordance with the possible worlds semantics. • Base tuples are independent of each other, the intermediate tuples that are generated during query evaluation are typically correlated

  10. outline • General Info • Introduction • Independent tuples model • Tuple correlations • Probabilistic graphical models & factored representations • Representing Dependencies • Query evaluation • Experiments • Conclusions & Work to be done

  11. Tuple correlations (1/2)

  12. Tuple correlations (2/2) • Although the tuple probabilities associated with s1, s2 and t1 are identical, the query results are drastically different across these four databases. • Since both intensionaland extensional semantics assume base tuple independence neither can be directly used to do query evaluation in such cases.

  13. outline • General Info • Introduction • Independent tuples model • Tuple correlations • Representing correlations • Query evaluation • Experiments • Conclusions & Work to be done

  14. Representing correlations(1/3) • Associate every tuple t with a Boolean valued random variable Xt • f (X) is a function of a (small) set of random variables X, where 0 <= f (X) <=1 • Associate with each tuple in the probabilistic database a random variable • Define factors on (sub)sets of tuple-based random variables to • encode correlations. • 5) The probability of an instantiation of the database is given by the product of all the factors.

  15. Representing correlations(2/3) • Suppose we want to represent mutual exclusivity between tuples s1 and t1. In particular, let us try to represent the possible worlds:

  16. Representing correlations(3/3) • Suppose we want to represent positive correlation between t1 and s1. In particular, let us try to represent the possible worlds:

  17. Probabilistic graphical model representation • A probabilistic graphical model is graph whose nodes represent random variables and edges represent correlations • Complete Ind. Mutual Exclusivity Positive Correlation Xt1 Xt1 Xs1 Xs1 Xt1 Xs1 Xs2 Xs2 Xs2

  18. Probabilistic graphical model representation X1 X2 X3

  19. outline • General Info • Introduction • Independent tuples model • Tuple correlations • Probabilistic graphical models & factored representations • Representing Dependencies • Query evaluation • Experiments • Conclusions & Work to be done

  20. Query evaluation: basic idea • Treat intermediate tuples as regular tuples. • Carefully represent correlations between intermediate tuples, base tuples and result tuples to construct a probabilistic graphical model. • Cast the probability computations resulting from query evaluation to inference in probabilistic graphical models.

  21. Query evaluation: example

  22. Query evaluation :example Probabilistic graphical model • Query evaluation problem in Prob. Databases: Compute the probability of the result tuple summed over all possible worlds of the database • Equivalent problem in prob. graph. models: marginal probability computation. • use inference algorithms Xs2 Xs1 Xt1 Xi1 Xi2 Xr1

  23. Xs2 Xt1 Xi1 Xi2 Xr1

  24. Representing probabilistic relations

  25. outline • General Info • Introduction • Independent tuples model • Tuple correlations • Probabilistic graphical models & factored representations • Representing Dependencies • Query evaluation • Experiments • Conclusions & Work to be done

  26. Experiments (1/3) • Database contains 860 publications from CiteSeer [GBL98]. • Searched for publications for given (misspelt) author name. • Naturally involves mutual exclusivity correlations

  27. Experiments (2/3) • Ran experiments on randomly generated TPC-H dataset of size 10MB. • The first bar on each query indicates the time it took to run the full query including all the database operations and the probabilistic computations. • The second one indicates the time it took to run only the database operations using our Java implementation.

  28. Experiments(3/3) • The result of running an average query over a synthetically generated dataset containing tuples

  29. outline • General Info • Introduction • Independent tuples model • Tuple correlations • Probabilistic graphical models & factored representations • Representing Dependencies • Query evaluation • Experiments • Conclusions & Work to be done

  30. conclusions • There is an increasing need for database solutions for efficiently managing and querying uncertain data exhibiting complex correlation patterns. • Asimple and intuitive framework is presented, based on probabilistic graphical models, for explicitly modeling correlations among tuples in a probabilistic database

  31. Work to be done • Problem:Although conceptually the approach presentedallows for capturing arbitrary tuple correlations, exact query evaluation over large datasets exhibiting complex correlations may not always be feasible. • Future Considerations: • Development of approximate query evaluation techniques that can be used in such cases • Develop disk-based query evaluation algorithms so that their techniques can scale to very large datasets.

More Related