220 likes | 226 Views
Modeling and Language Support for the management of PBMS. Manolis Terrovitis Panos Vassiliadis Spiros Skiadopoulos Elisa Bertino Barbara Catania Anna Maddalena. Outline. Introduction Modeling of data and patterns Query operators Summary and future work. Motivation.
E N D
Modeling and Language Support for the management of PBMS Manolis Terrovitis Panos Vassiliadis Spiros Skiadopoulos Elisa Bertino Barbara Catania Anna Maddalena
Outline • Introduction • Modeling of data and patterns • Query operators • Summary and future work
Motivation • Huge amounts of data are produced. • Interesting knowledge has to be detected and extracted. • Knowledge extraction techniques (i.e., Data Mining) are not sufficient: • Huge amounts of results (clusters, association tules, decision trees etc) • Arbitrary modeling of results
Motivation (con’t) • We need to be able to manipulate the knowledge discovered! • The basic requirements: • A generic and homogenous model for patterns. • Well defined query operators. • Efficient storage.
The Patterns and PBMS [Rizzi et. al. ER 2003] • Patterns are compact and rich in semantics representations of raw data. • Clusters, association rules, decision trees e.t.c. • Pattern Base Management System • Patterns are treated as first class citizens • Pattern-based queries • Approximate mapping between patterns and raw data
Contributions • We formally define the logical foundations for pattern management • We present a pattern specification language • We introduce queries and query operators
Outline • Introduction • Modeling of data and patterns • Query operators • Summary and future work
PBMS architecture • Pattern Space: • Pattern Types • Pattern Classes • Patterns • Intermediate Results • Data Space
The patterns • Patterns hold information for: • the data source • the structure of the pattern • The relation between the structure and the source, in an approximate logical formula.
The formula • An intentional description of the pattern-data relation • pros: • Efficiency, more intuitive results • cons: • Accuracy
The formula (con’t) The formula is a predicate: fp(x,y) where x Source,y Structure • Expressiveness. • Functions and predicates • Safety. • Range restriction. • Queries employing the formula are n-depth domain independent.
Outline • Introduction • Modeling of data and patterns • Query operators • Summary and future work
Query Operators • Query operator classes: • Database operators • Pattern Base operators • Crossover database operators • Crossover pattern base operators
Crossover Operators Exact evaluation, via the intermediate mappingsApproximate evaluation, via the formula Data Space Pattern Space Exact PID data Approximation formula structure Exact
Crossover Operators • Database • Drill-Through: Which data are represented by these patterns? • Data-Covering: Which data from this dataset can be represented by this pattern? • Pattern Base • Pattern-Covering: Which of these patterns represent this dataset?
Query Example Drill-through( { p | p intersects q})
Outline • Introduction • Modeling of data and patterns • Query Operators • Summary and future work
Summary • Formal specification of basic PBMS concepts • Investigation on the representation of the pattern-data relation • Formal definition of query operators
Future Work • Query language • Generic similarity measures • Efficient implementation of intermediate mappings • Statistical measures for the patterns.