Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Using Taxonomies to Perform Aggregated Querying over Imprecise Data Atanu Roy ChandrimaSarkar Rafal A. Angryk Presented by: Rafal A. Angryk Date: 2010-12-14

Outlines of the Presentation • Idea • Imprecision • Motivation • Limitations of Previous Work • Definitions • Approach • Experimental Setup & Results • Conclusion and Future Work Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Idea of the Project • This paper provides framework for answering queries over imprecise data found in the common databases. • We propose to solve this by classifying the data into taxonomical hierarchies and then capturing it in weighted hierarchical hypergraph. Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Imprecision in Databases: An Example Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Constraint: All soybean seeds with the same kind of stem canker should germinate in the same month of the season. Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Motivation • Several recent papers have focused on retrieval of imprecise data, where every fact can be a region, instead of a point, in a multi-dimensional space. • The most prominent one is [BDRV07] • They have solved it by constructing marginal databases (MDBs) from extended database (EDBs) with the help of constraint hypergraph. Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Limitations of Previous Work • Creating Marginal Databases using weighted hierarchical Hypergraph, employs brute force method for retrieving connected facts (tuples). • This increases the overall time complexity and processing time of the queries. • [BDRV07] follows a data specific technique but we propose to follow a domain specific knowledge Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Definitions • Background knowledge: Knowledge required to generate taxonomies. • Expert knowledge: Domain-specific human expertise. • Data-derived knowledge: Derived from historic precise database and is used to generate mutually exclusive probabilities • Possible worlds: All the possible combinations that an imprecise record can assume. • Valid world: All the possible worlds which satisfies a given set of constraints. Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Assignment of Probabilities Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

EDB Creation • Probability of a possible world is the product of the unconditional occurrences of all imprecise attributes. • Sum of probabilities of all possible worlds of an imprecise record is 1. • Probability assignment rule creates a set of tuples using Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Hyperedge Creation Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

MDB Creation • Weighted hierarchical hypergraph is defined as H(L, E) where L represents the nodes and E is the set of hyperedges between different taxonomies. • Each hyperedge signifies a distinct combination of attribute values. The weight of a possible world assigned to a hyperedge [AC10] needs to preserve the a few properties. • All t-norms [AC10] (e.g. minimum, product) fulfill these requirements. We choose product for the purposes of our preliminary investigation. Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

EDB  MDB Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Aggregated Querying • We aggregate tuples for aggregated querying based on its uniqueness. • Group two tuples only when all their attributes values and the corresponding probabilities are the same. • Find the total no. of plants grown in august which have a Stem Canker above-sec-node • (44*0.9057) + (25*0.6429) ≈ 56 Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Experimental Setup • Census-Income dataset from UCI Machine Learning repository. • Finally used 7 dimensions. • Precise database has 191239 records. • Test dataset has 99762 records. • Randomly inserted imprecision into the test dataset to make it imprecise. Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Distribution of Imprecision Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Imprecision Characteristics Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Scalability Test Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Extended Database Analysis Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Influence of Imprecision Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Absolute Percentage Error Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Conclusion and Future Work • In this research we significantly present a framework for efficient querying over imprecise data with an average of ≈ 94% accuracy • We intend to extend this research to include Ontology in place of Taxonomy. • We also intend to use Associative Weight Mining to assign weights to hyperedges. Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Questions? • References • [BDRV07]: Douglas Burdick, AnHai Doan, RaghuRamakrishnan, ShivakumarVaithyanathan: OLAP over Imprecise Data with Domain Constraints. VLDB 2007: 39-50 • [AC10]: Rafal A. Angryk, JacekCzerniak: Heuristic Algorithm for Interpretation of Multi-Valued Attributes in Similarity-based Fuzzy Relational Databases. International Journal of Approximate Reasoning 51: 895-911 (2010) Roy, Sarkar, Angryk. Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Using Taxonomies to Perform Aggregated Querying over Imprecise Data

Presentation Transcript

SIGMOD’03 Evaluating Probabilistic Queries over Imprecise Data

QUERYING IMPRECISE DATA IN MOVING OBJECT ENVIRONMENT

Querying Encrypted Data

Querying Using Excel

Querying Encrypted Data using Fully Homomorphic Encryption

Reporting Aggregated Data Using the Group Functions

OLAP over Uncertain and Imprecise Data

OLAP Over Uncertain and Imprecise Data

Querying Distributed Data using XML

QUERYING IMPRECISE DATA IN MOVING OBJECT ENVIRONMENT

Answering Imprecise Queries over Web Databases

Data to be aggregated

Approximate Selection Queries over Imprecise Data

Querying your data

Using Aggregated Federal Data and Local Shipping Data to Model Freight Alabama

Reporting Aggregated Data Using the Group Functions

Querying Encrypted Data

Data Querying Website

Reporting Aggregated Data Using the Group Functions

Querying Tree-Structured Data Using Dimension Graphs

Answering Imprecise Queries over Web Databases

Reporting Aggregated Data Using the Group Functions