220 likes | 229 Views
This academic paper discusses the modeling of data and patterns, query operators, and PBMS architecture for efficient knowledge extraction and manipulation. It introduces a pattern specification language, query examples, and evaluates approximate and exact evaluation methods for Pattern Base Management Systems.
E N D
Modeling and Language Support for the management of PBMS Manolis Terrovitis Panos Vassiliadis Spiros Skiadopoulos Elisa Bertino Barbara Catania Anna Maddalena
Outline • Introduction • Modeling of data and patterns • Query operators • Summary and future work
Motivation • Huge amounts of data are produced. • Interesting knowledge has to be detected and extracted. • Knowledge extraction techniques (i.e., Data Mining) are not sufficient: • Huge amounts of results (clusters, association tules, decision trees etc) • Arbitrary modeling of results
Motivation (con’t) • We need to be able to manipulate the knowledge discovered! • The basic requirements: • A generic and homogenous model for patterns. • Well defined query operators. • Efficient storage.
The Patterns and PBMS [Rizzi et. al. ER 2003] • Patterns are compact and rich in semantics representations of raw data. • Clusters, association rules, decision trees e.t.c. • Pattern Base Management System • Patterns are treated as first class citizens • Pattern-based queries • Approximate mapping between patterns and raw data
Contributions • We formally define the logical foundations for pattern management • We present a pattern specification language • We introduce queries and query operators
Outline • Introduction • Modeling of data and patterns • Query operators • Summary and future work
PBMS architecture • Pattern Space: • Pattern Types • Pattern Classes • Patterns • Intermediate Results • Data Space
The patterns • Patterns hold information for: • the data source • the structure of the pattern • The relation between the structure and the source, in an approximate logical formula.
The formula • An intentional description of the pattern-data relation • pros: • Efficiency, more intuitive results • cons: • Accuracy
The formula (con’t) The formula is a predicate: fp(x,y) where x Source,y Structure • Expressiveness. • Functions and predicates • Safety. • Range restriction. • Queries employing the formula are n-depth domain independent.
Outline • Introduction • Modeling of data and patterns • Query operators • Summary and future work
Query Operators • Query operator classes: • Database operators • Pattern Base operators • Crossover database operators • Crossover pattern base operators
Crossover Operators Exact evaluation, via the intermediate mappingsApproximate evaluation, via the formula Data Space Pattern Space Exact PID data Approximation formula structure Exact
Crossover Operators • Database • Drill-Through: Which data are represented by these patterns? • Data-Covering: Which data from this dataset can be represented by this pattern? • Pattern Base • Pattern-Covering: Which of these patterns represent this dataset?
Query Example Drill-through( { p | p intersects q})
Outline • Introduction • Modeling of data and patterns • Query Operators • Summary and future work
Summary • Formal specification of basic PBMS concepts • Investigation on the representation of the pattern-data relation • Formal definition of query operators
Future Work • Query language • Generic similarity measures • Efficient implementation of intermediate mappings • Statistical measures for the patterns.