170 likes | 269 Views
A Schema Integration Framework over Super-Peer based Network. Hao Ding NTNU/IDI SCC-PTPSC 2004, Shanghai, China. Agenda. Motivation Goals and Objectives Related Works State-of-the-art Platform and Technical Suggestions Conclusion . Motivations. Dilemma.
E N D
A Schema Integration Framework over Super-Peer based Network Hao Ding NTNU/IDI SCC-PTPSC 2004, Shanghai, China
Agenda • Motivation • Goals and Objectives • Related Works • State-of-the-art • Platform and Technical Suggestions • Conclusion
Motivations Dilemma • On one hand, accumulated large volume retrieval systems produced by different providers in various schemas. • On the other hand, users prefer to access all these heterogeneous data in a unified interface.
Goals and Objectives • A unified access to a more complete view of domain specific information (or content-related information) which is disseminated around the world in various forms.
Knowledge Preparation • OAI-PMH • P2P Protocol. • w.r.t., JXTA protocol • Semantic Web and Ontology Theory • w.r.t., JENA SW framework • RDF, OWL
Related Works • Resource Integration • Bibliographic metadata based Method • i.e., MARC 856 field - URL • Database Browsing and Navigation Method • i.e., subjects-based
Related Works (con’d) • Global as View (GaV) Method • i.e., GaV in NSDL: 9 metadata formats Source DB1 linking Local MD query Global-View MD Set Source DB2 integrate Local MD results Source DBn Local MD linking
Local MD Local MD Local MD Related Works (con’d) • Local as View Method • i.e., MetaCrawler Source DB1 query Source DB2 Query Dist. & Result Int. distribute results Source DBn
State-of-the-art • Problem Statements • Various forms: flat files, database, schema-based libraries, etc. • Semantic ambuguities: the key characteristics of data integration is not so much its volume, but its diversity, heterogeneity and dispersion. (Rechemann, 2000) • Scalability • Authentication
III II I State-of-the-art (con’d) • Scenario of information searching in P2P environment: a1 a2 1.request 5. ack a0 4.resply c0 2.find 3.find c2 Semantic-based Negotiation c1 b0 b3 I: Shared schema II: different schema but in the same community III: different schema and community harvesting b1 b2
State-of-the-art (con’d) • One conventional approach is adopting OAI-PMH for harvesting heterogeneous resources which must support a common metadata set, e.g., DC. arXiv NSDL OCLC's Experimental Thesis Catalog Data Provider … … Service Provider Arc Kepler Users
State-of-the-art (con’d) • In our approach: • Peers will be wrapped with harvesting protocol in order to get content • ’Service Provider’ is removed. No ’Mediator’ any more • Advantages: • Data are always up-to-date • Data providers can ’join’ and ’leave’ freely • Query can reach all available data providers • Query mechanism can also be improved to allow users to choose their favorite data providers. • Challenges: • No control of the qualities of the data providers. • Limitation on the scalability because of the query flooding NSDL OCLC's Experimental Thesis Catalog arXiv P2P Network Kepler Arc Users
State-of-the-art (con’d) • Key problems: • Topology of the system infrastructure: connected graphs, not only hierarchies • ‘hierarchies represent the limitation of the human view of complex structures’ (From Keith, Infosam 2004) • Autonomous understanding of the complex semantics • Domain ontologies to provide supportive metadata for interoperability • Upper-Level ontologies • a foundation for more specific domain ontologies.
MDi MDi Interpreter Interpreter JENA Inference Engine Relationships Generation among MD Records • Semantic Web Framework: adopting JENA • Ontology Language: OWL (which is compatible with JENA).
Platform and Technics • Platform: adopting JXTA Protocol for constructing P2P environment • Semantic Web Framework: adopting JENA • Ontology Language: OWL (which is compatible with JENA). • Inference Engine: Jess or Japster (survey pending) • Testing Domain: Bibliographic records – INEX collection. • More testing collections are to be selected on the basis of the difference in content, format, access mechanism: e.g,SWISS-PORT, EMBL,etc. • Upper Level Ontology: UMLS • Domain Specific Ontologies
Conclusions • A scenario of complete view on heterogenous resources • Problem statements and State-of-the-art • Proposed platform and technical suggestions • Other open problems • Result integration and ranking • Data Providers Location • Query decomposition algorithms