280 likes | 288 Views
This paper explores the challenges and proposed solutions for integrating aggregate queries in a peer-to-peer OLAP system, focusing on fact and dimension integration. The architecture involves autonomous peer data management and cooperative query answering.
E N D
Aggregate Queries in Peer-to-Peer OLAP • Mauricio Minuto Espil • Faculty of Engineering • Universidad Católica Argentina • Alejandro A. Vaisman • Computer Science Department • Universidad de Buenos Aires 7th International Workshop on Data Warehousing & OLAP
Aggregate Queries in Peer-to-Peer OLAP OUTLINE: • CHARACTERIZATION • PROBLEM AND PROPOSAL • FACT INTEGRATION • DIMENSION INTEGRATION • AGGREGATE QUERIES • CONCLUSIONS
Peer-to-Peer Systems MAIN CHARACTERISTICS: • Involves a network of interconnected peer systems; • The network topology is not relevant; • Each peer maintains full autonomy over its own data resources; • Each peer may assume the role of local. The rest become acquaintances of the local peer; • The roles of local and acquaintance among peers are not static; they are functional and are determined with respect to an operation.
Peer-to-Peer Data Management MAIN CHARACTERISTICS: • No global schema is assumed to exist for data; • Each peer must manage its data according its own perspective; • A query may be posed on any peer, the responsive peer becomes local with respect to the query; • Answers to queries must conform the best attempt to gather data from all peers; • Answers to queries posed by local peer users must conform the view those users have of their data; • Peers must cooperate in maintaining the local views of data;
Aggregate Queries in Peer-to-Peer OLAP OUTLINE: • CHARACTERIZATION • PROBLEM AND PROPOSAL • FACT INTEGRATION • DIMENSION INTEGRATION • AGGREGATE QUERIES • CONCLUSIONS
OLAP Data in a Peer-to-Peer System THE PROBLEM: • OLAP data is essentially multidimensional; • Multidimensional data consists in a collection of views of base and derived aggregated data, describing fact indicators by dimensions of analysis; • Concepts for aggregation within dimensions are obtained from finer grain concepts through hierarchies; • Different peers may have affine fact indicators described by different dimension hierarchies; • Integration is needed: Any summary concept that appears in a hierarchy of a peer acquaintance must be transformed into a summary concept meaningful to the local peer. •••• >
OLAP Data in a Peer-to-Peer System •••• > THE PROBLEM • The expected integration is not always possible; • Users may pose OLAP queries in a local peer expecting results involving all relevant data stored in all peers. • Local queries must be propagated among the acquaintances; • A rewriting of the propagated queries is needed to conform the view of the local user. • The rewriting technique must accomplish the data integration on the fly; • Incomplete and uncertain results must be admitted;
Peer-to-Peer OLAP MODEL (DEFINES): • FACT PEERS • DIMENSION PEERS • AGGREGATE P2P OLAP QUERIES • COMPLETE AND CERTAIN QUERY ANSWERS ARCHITECTURE (INVOLVES): • AUTONOMOUS PEER DATA MANAGEMENT • THREE PHASE PEER TO PEER COORDINATION • COOPERATIVE QUERY ANSWERING
Aggregate Queries in Peer-to-Peer OLAP OUTLINE: • CHARACTERIZATION • PROBLEM AND PROPOSAL • FACT INTEGRATION • DIMENSION INTEGRATION • AGGREGATE QUERIES • CONCLUSIONS
Fact Integration TYPES OF FACT: • GENERIC FACT • FACT PEERS IS-A RELATIONSHIP FACT CONCILIATION PHASE: PUBLISHES GENERIC FACT DEFINITION AND DIMENSIONAL STRUCTURE SOURCE PEER LISTENING PEER GENERIC FACT AGREEMENT AND DIMENSION PEERS DEFINITION
Aggregate Queries in Peer-to-Peer OLAP OUTLINE: • CHARACTERIZATION • PROBLEM AND PROPOSAL • FACT INTEGRATION • DIMENSION INTEGRATION • AGGREGATE QUERIES • CONCLUSIONS
Dimension Integration INVOLVES: • A PAIR OF DIMENSION PEERS CONSISTS IN: • LEVEL HIERARCHY INTEGRATION • MEMBER HIERARCHY INTEGRATION. COMPRISES: • CORRESPONDENCE DEFINITION AMONG DIMENSION LEVELS • REVISION/MAPPING DEFINITION AMONG DIMENSION INSTANCES
Level Hierarchy Integration LEVEL CORRESPONDENCE • APPLIES ON SCHEMAS • ESTABLISHES HOW A PAIR OF LEVELS ON DIFFERENT PEER DIMENSIONS ARE RELATED • IS PRODUCED/UPDATED DURING A SCHEMA CONCILIATION PHASE • IS MATERIALIZED AS METADATA IN CORRESPONDENCE TABLES
Level Hierarchy Integration ORDER PRESERVING LEVEL CORRESPONDENCE All All Tax Discharge Category Benefit Type Benefit Type Charity Modality Funding Class Loan Type
Level Hierarchy Integration A LEVEL CORRESPONDENCE THAT DO NOT PRESERVE ORDER IS NOT ADMISSIBLE All All Tax Discharge Category Benefit Type Benefit Type WRONG Charity Modality Funding Class Loan Types
Member Hierarchy Integration INTEGRATION BY MAPPING • APPLIES ON INSTANCES • ESTABLISHES HOW A PAIR OF MEMBERS OF CORRESPONDING LEVELS ARE RELATED • IS PRODUCED/UPDATED DURING A MAPPING ACQUISITION PHASE • MUST BE PRECEDED BY AT LEAST ONE SCHEMA CONCILIATION PHASE • IS MATERIALIZED AS METADATA IN MAPPING TABLES
Member Hierarchy Integration MAPPINGS: HOMOMORPHISM PROPERTY map l':m’ roll-up roll-up map l:m For each member m of a level l, such that map (l:m) is defined, if there exists some member m’ of level l’, satisfying roll-up (l:m) = l’:m’ and level l’ is in dom(Correspondence) then roll-up (map (l:m) ) = map (l’:m’).
Member Hierarchy Integration HOMOMORPHISM MAY NOT BE ALWAYS GRANTED l':m’ roll-up roll-up roll-up roll-up l:m2 map map l:m1 Member m’ in level l’ is conflicting, it cannot be mapped. An approach based on mapping exclusively is not always effective.
Member Hierarchy Integration MAPPINGS DO NOT SUFFICE: REVISIONS MAY BE NECESSARY Conflicting Member l':m’ LOCAL l:m2 l:m1 ACQUAINTANCE REVISIONS AFFECT THE VIEW A PEER HAS OF THE HIERARCHY OF ITS ACQUAINTANCE ONLY
Member Hierarchy Integration EXAMPLE OF A REVISION: CONFLICTING MEMBER SPLIT l':m2’ l:m1’ Non-Conflicting Members LOCAL l:m2 l:m1 ACQUAINTANCE A REVISION BY SPLITTING MAY BE USED TO REPAIR CONFLICTS GIVING WAY TO MAPPABLE MEMBERS
Member Hierarchy Integration EXAMPLE OF A REVISION: CONFLICTING MEMBER RECLASSIFICATION l':m” Non-Conflicting Members l:m’ LOCAL l:m3 l:m2 l:m1 ACQUAINTANCE A REVISION BY RECLASSIFYING MAY BE AN ALTERNATIVE TO RESTORE HOMOMORPHISM
Member Hierarchy Integration REVISE AND MAP APPROACH: LOCAL PEER: • PRODUCES ANDBROADCASTS REVISION AND MAPPING DEFINITIONS TO POTENTIAL ACQUAINTANCES ACQUAINTANCE: • REVISES ITS OWN HIERARCHIES PRODUCING A REVISED INSTANCE (REVISED ROLL-UPS) WITH RESPECT TO THE LOCAL PEER • STORE INFORMATION ON MAPPINGS IN METADATA MAPPING TABLES
Member Hierarchy Integration BOTTOM-UP COMPLETION APPROACH l':m2’ Non-Mapped Member l':m1’ roll-up roll-up roll-up Incomplete roll-up l:m2 map map l:m1 Whenever some member m2’ of a level l’ is not mapped, a bottom-up completion approach for query answering is employed: information on non-mapped members and their roll-ups is stored in metadata completion tables.
Aggregate Queries in Peer-to-Peer OLAP OUTLINE: • CHARACTERIZATION • PROBLEM AND PROPOSAL • FACT INTEGRATION • DIMENSION INTEGRATION • AGGREGATE QUERIES • CONCLUSIONS
P2P OLAP Queries Syntactical Structure (Datalog Style): query( Z1, ... , Zn, aggr(M), Set of Peers) Generic Fact(X1, ... , Xn, M ), rollup dimension d1 from bottom level to desired level l1 ( X1, Z1 ), ... , rollup dimension dn from bottom level to desired level ln ( Xn, Zn );
Query Evaluation Process • GENERATES A QUERY FOR EACH RELEVANT PEER (INCLUDING THE LOCAL PEER); • GENERATED QUERIES ARE PROPAGATED TO RELEVANT PEERS; • QUERIES FOR RELEVANT PEERS STEM FROM THE REWRITING OF THE SUBMITTED P2P OLAP QUERY; • THE REWRITING PROCESS INTRODUCES REFERENCES TO FACT PEERS, REVISED ROLL-UPS, AND MAPPING AND COMPLETION TABLES; • RESULTS OF PROPAGATED QUERIES ARE COLLECTED AND AGGREGATED LOCALLY TO PRODUCE THE FINAL QUERY ANSWER; • QUERY ANSWERS MAY BE UNCERTAIN AND INCOMPLETE DUE TO BOTTOM-UP COMPLETION.
Query Processing Local Peer Relevant Peer Integration Answer Partial Result Evaluation Completion tables Rewriting Fact tables Mapping tables Metadata QUERY Revised Rollups
Aggregate Queries in Peer-to-Peer OLAP CONCLUSIONS: MAIN POINTS DISCUSSED • GENERIC FACTS • FACT CONCILIATION PHASE • HIERARCHY LEVEL CORRESPONDENCE • SCHEMA CONCILIATION PHASE • REVISE AND MAP APPROACH • BOTTOM-UP COMPLETION • MAPPING ACQUISITION PHASE • P2P OLAP QUERIES • QUERY REWRITING AND EVALUATION